Genes encoding novel proteolytic enzymes

ABSTRACT

The invention relates to newly identified gene sequences that encode novel proteases obtainable from  Aspergillus niger . The invention features the full length gene sequence of the novel genes, their cDNA sequences as well as the full-length functional protein and fragments thereof. The invention also relates to methods of using these enzymes in industrial processes and methods of diagnosing fungal infections. Also included in the invention are cells transformed with DNA according to the invention and cells wherein a protease according to the invention is genetically modified to enhance or reduce its activity and/or level of expression.

CROSS-REFERENCE TO RELATED APPLICATION

This application is the national phase of PCT application PCT/EP02/01984 having an international filing date of 22 Feb. 2002, and claims priority from European applications: 01205117.3 filed 21 Dec. 2001; 01204464.0 filed 15 Nov. 2001; 01000552.8, 01000553.6, 01000554.4, 01000556.9, 01000557.7 and 01000558.5 filed 22 Oct. 2001; 01000478.6 and 01000483.6 filed 20 Sep. 2001; 01000374.7 and 01000377.0 filed 16 Aug. 2001; 01000357.2 filed 9 Aug. 2001; 01000341.6, 01000342.4, 01000343.2 and 01000344.0 filed 2 Aug. 2001; 01000320.0, 01000321.8, 01000322.6, 01000323.4 and 01000327.5 filed 30 Jul. 2001; 01000280.6, 01000285.5, 01000286.3 and 01000287.1 filed 12 Jul. 2001; 01000234.3, 01000237.6, 01000238.4, 01000240.0, 01000242.6, 01000244.2 and 01000246.7 filed 21 Jun. 2001; 01000225.1 and 01000229.3 filed 20 Jun. 2001; 01000156.8, 01000159.2, 01000160.0, 01000162.6, 01000165.9, 01000166.7 and 01000168.3 filed 21 May 2001; 01000075.0, 01000078.4, 01000080.0, 01000084.2, 01000085.9, 01000087.5 and 01000088.3 filed 28 Mar. 2001; 01200707.6, 01200708.4, 01200719.1 and 01200706.8 filed 26 Feb. 2001; and 01200660.7, 01200658.1 and 01200657.3 filed 23 Feb. 2001. The contents of these documents are incorporated herein by reference.

FIELD OF THE INVENTION

The invention relates to newly identified polynucleotide sequences comprising genes that encode novel proteases isolated from Aspergillus niger. The invention features the full length nucleotide sequence of the novel genes, the cDNA sequences comprising the full length coding sequences of the novel proteases as well as the amino acid sequences of the full-length functional proteins and fragments and variants thereof. The invention also relates to methods of using these enzymes in industrial processes and methods of diagnosing fungal infections. Also included in the invention are cells transformed with a polynucleotide according to the invention and cells wherein a protease according to the invention is genetically modified to enhance or reduce its activity and/or level of expression.

BACKGROUND OF THE INVENTION

Proteolytic Enzymes

Proteins can be regarded hetero-polymers that consist of amino acid building blocks connected by a peptide bond. The repetitive unit in proteins is the central alpha carbon atom with an amino group and a carboxyl group. Except for glycine, a so-called amino acid side chain substitutes one of the two remaining alpha carbon hydrogen atoms. The amino acid side chain renders the central alpha carbon asymmetric. In general, in proteins the L-enantiomer of the amino acid is found. The following terms describe the various types of polymerized amino acids. Peptides are short chains of amino acid residues with defined sequence. Although there is not really a maximum to the number of residues, the term usually indicates a chain which properties are mainly determined by its amino acid composition and which does not have a fixed three-dimensional conformation. The term polypeptide is usually used for the longer chains, usually of defined sequence and length and in principle of the appropriate length to fold into a three-dimensional structure. Protein is reserved for polypeptides that occur naturally and exhibit a defined three-dimensional structure. In case the proteins main function is to catalyze a chemical reaction it usually is called an enzyme. Proteases are the enzymes that catalyze the hydrolysis of the peptide bond in (poly)peptides and proteins.

Under physiological conditions proteases catalyse the hydrolysis of the peptide bond. The International Union of Biochemistry and Molecular Biology (1984) has recommended to use the term peptidase for the subset of peptide bond hydrolases (Subclass E.C 3.4.). The terms protease and peptide hydrolase are synonymous with peptidase and may also be used here. Proteases comprise two classes of enzymes: the endo-peptidases and the exo-peptidases, which cleave peptide bonds at points within the protein and remove amino acids sequentially from either N or C-terminus respectively. Proteinase is used as a synonym for endo-peptidase. The peptide bond may occur in the context of di-, tri-, tetra-peptides, peptides, polypeptides or proteins. In general the amino acid composition of natural peptides and polypeptides comprises 20 different amino acids, which exhibit the L-configuration (except for glycine which does not have a chiral centre). However the proteolytic activity of proteases is not limited to peptides that contain only the 20 natural amino acids. Peptide bonds between so-called non-natural amino acids can be cleaved too, as well as peptide bonds between modified amino acids or amino acid analogues. Some proteases do accept D enantiomers of amino acids at certain positions. In general the remarkable stereoselectivity of proteases makes them very useful in the process of chemical resolution. Many proteases exhibit interesting side activities such as esterase activity, thiol esterase activity and (de)amidase activity. These side activities are usually not limited to amino acids only and might turn out to be very useful in bioconversions in the area of fine chemicals.

There are a number of reasons why proteases of filamentous fungi, eukaryotic microorganisms, are of particular interest. The basic process of hydrolytic cleavage of peptide bonds in proteins appears costly and potentially detrimental to an organism if not properly controlled. The desired limits to proteolytic action are achieved through the specificity of proteinases, by compartmentalization of proteases and substrates within the cell, through modification of the substrates allowing recognition by the respective proteases, by regulation via zymogen activation, and the presence or absence of specific inhibitors, as well through the regulation of protease gene expression. In fungi, proteases are also involved in other fundamental cellular processes, including intracellular protein turnover, processing, translocation, sporulation, germination and differentiation. In fact, Aspergillus nidulans and Neurospora crassa have been used as model organisms for analyzing the molecular basis of a range of physiological and developmental processes. Their genetics enable direct access to biochemical and genetical studies, under defined nutrient and cultivation conditions. Furthermore, a large group of fungi pathogenic to humans, live-stock and crop, has been isolated and proteolysis has been suggested to play a role in their pathogenicity (host penetration, countering host defense mechanisms and/or nutrition during infection). Proteases are also frequently used in laboratory, clinical and industrial processes; both microbial and non-microbial proteases are widely used in the food industry (baking, brewing, cheese manufacturing, meat tenderizing), in tanning industry and in the manufacture of biological detergents (Aunstrup, 1980). The commercial interest in exploiting certain filamentous fungi, especially the Aspergilli, as hosts for the production of both homologous and heterologous proteins, has also recently renewed interests in fungal proteases (van Brunt, 1986ab). Proteases often cause problems in heterologous expression and homologous overexpression of proteins in fungi. In particular, heterologous expression is hampered by the proteolytic degradation of the expressed products by homologous proteases. These commercial interests have resulted in detailed studies of proteolytic spectra and construction of protease deficient strains and have improved the knowledge about protease expression and regulation in these organisms. Consequently there is a great need to identify and eliminate novel proteases in filamentous fungi.

Micro-organisms such as for example fungi are particularly useful in the large scale production of proteins. In particular when such proteins are secreted into the medium. Proteolytic enzymes play a role in these production processes. On the one hand particular proteolytic enzymes are in general required for proper processing of the target protein and the metabolic well-being of the production host. On the other hand proteolytic degradation may significantly decrease the yield of secreted proteins. Poor folding in the secretion pathway may lead to degradation by intracellular proteases. This might be a particular problem with producing heterologous proteins. The details of the proteolytic processes, which are responsible for the degradation of the proteins that are diverted from the secretory process in fungi are not exactly known. In eukaryotes the degradation of cellular proteins is achieved by a proteasome and usually involves ubiquitin labelling of proteins to be degraded. In fungi, proteasomal and vacuolar proteases are also likely candidates for the proteolytic degradation of poorly folded secretory proteins. The proteolytic degradation is likely cytoplasmic, but endoplamatic reticulum resident proteases cannot be excluded. From the aspect of production host strain improvement the proteolytic system may be an interesting target for genetic engineering and production strain improvement. Additional copies of protease genes, over-expression of certain proteases, modification of transcriptional control, as well as knock out procedures for deletion of protease genes may provide a more detailed insight in the function a given protease. Deletion of protease encoding genes can be a valuable strategy for host strain improvement in order to improve production yield for homologous as well as heterologous proteins.

Eukaryotic microbial proteases have been reviewed by North (1982). More recently, Suarez Rendueles and Wolf (1988) have reviewed the S. cerevisiae proteases and their function.

Apart from the hydrolytic cleavage of bonds, proteases may also be applied in the formation of bonds. Bonds in this aspect comprise not only peptide and amide bonds but also ester bonds. Whether a protease catalyses the cleavage or the formation of a particular bond does in the first place depend on the thermodynamics of the reaction. An enzyme such as a protease does not affect the equilibrium of the reaction. The equilibrium is dependent on the particular conditions under which the reaction occurs. Under physiological conditions the thermodynamics of the reactions is in favour of the hydrolysis of the peptide due to the thermodynamically very stable structure of the zwitterionic product. By application of physical-chemical principles to influence the equilibrium, or by manipulating the concentrations or the nature of the reactants and products, or by exploiting the kinetic parameters of the enzyme reaction it is possible to apply proteases for the purpose of synthesis of peptide bonds. The addition of water miscible organic solvents decreases the extent of ionisation of the carboxyl component, thereby increasing the concentration of substrate available for the reaction. Biphasic systems, water mimetics, reverse micelles, anhydrous media, or modified amino and carboxyl groups to invoke precipitation of products are often employed to improve yields. When the proteases with the right properties are available the application of proteases for synthesis offers substantial advantages. As proteases are stereoselective as well as regio-selective, sensitive groups on the reactants do usually not need protection and reactants do not need to be optically pure. As conditions of enzymatic synthesis are mild, racemization and decomposition of labile reactants or products can be prevented. Apart from bonds between amino acids, also other compounds exhibiting a primary amino group, a thiol group or a carboxyl group may be linked by properly selected proteases. In addition esters, thiol esters and amides may be synthesized by certain proteases. Protease have been shown to exhibit regioselectively in the acylation of mono, di- and tri-saccharides, nucleosides, and riboflavin. Problems with stability under the sometimes harsh reaction conditions may be prevented by proper formulation. Encapsulation and immobilisation do not only stabilise enzymes but also allow easy recovery and separation from the reaction medium. Extensive crosslinking, treatment with aldehydes or covering the surface with certain polymers such as dextrans, polyethyleneglycol, polyimines may substantially extend the lifetime of the biocatalyst.

The Natural Roles of Proteases

Traditionally, proteases have been regarded as degrading enzymes, capable of cleaving proteins into small peptides and/or amino acids, and whose role it is to digest nutrient protein or to participate in the turnover of cellular proteins. In addition, it has been shown that proteases also play key roles in a wide range of cellular processes, via mechanisms of selective modification by limited proteolysis, and thus can have essential regulatory functions (Holzer and Tschensche 1979; Holzer and Heinrich, 1980). The specificity of a proteinase is assumed to be closely related to its physiological function and its mode of expression. With respect to the function of a particular protease, its localisation is often very important; for example, a lot of the vacuolar and periplasmic proteases are involved in protein degradation, while many of the membrane-bound proteases are important in protein processing (Suarez Rendueles and Wolf, 1988). The different roles of proteases in many cellular processes can be divided into four main functions of proteases: 1) protein degradation, 2) posttranslational processing and (in)activation of specific proteins, 3) morphogenesis, and 4) pathogenesis.

An obvious role for proteases in organisms which utilise protein as a nutrient source is in the hydrolysis of nutrients. In fungi, this would involve the degradation outside the cells by extracellular broad specificity proteases. Protein degradation is also important for rapid turnover of cellular proteins and allows the cell to remove abnormal proteins and to adapt their complement of protein to changing physiological conditions. Generally, proteases of rather broad specificity should be extremely well-controlled in order to protect the cell from random degradation of other than correct target proteins.

Contrary to the hydrolysis the synthesis of polypeptides occurs in vivo by an ATP driven process on the ribosome. Ultimately the sequence in which the amino acids are linked is dictated by the information derived from the genome. This process is known as the transcription. Primary translation products are often longer than the final functional products, and after the transcription usually further processing of such precursor proteins by proteases is required. Proteases play a key role in the maturation of such precursor proteins to obtain the final functional protein. In contrast to the very controlled trimming and reshaping of proteins, proteases can also be very destructive and may completely degrade polypeptides into peptides and amino acids. In order to avoid that proteolytic activity is unleashed before it is required, proteases are subject to extensive regulation. Many proteases are synthesized as larger precursors known as zymogens, which become activated when required. Remarkably this activation always occurs by proteolysis. Apart from direct involvement in the processing, selective activation and inactivation of individual proteins are well-known phenomena catalyzed by specific proteases.

The selectivety of limited proteolysis appears to reside more directly in the proteinase-substrate interaction. Specificity may be derived from the proteolytic enzyme which recognizes only specific amino acid target sequences. On the other hand, it may also be the result of selective exposure of the ‘processing site’ under certain conditions such as pH, ionic strength or secondary modifications, thus allowing an otherwise non-specific protease to catalyze a highly specific event. The activation of vacuolar zymogens by limited proteolysis gives an example of the latter kind.

Morphogenesis or differentiation can be defined as a regulated series of events leading to changes from one state to another in an organism. Although direct relationships between proteases and morphological effects could not be established in many cases, the present evidence suggests a significant involvement of proteases in fungal morphogenesis; apart form the observed extensive protein turnover during differentiation, sporulation and spore germination, proteases are thought to be directly involved in normal processes as hyphal tip branching and septum formation, (Deshpande, 1992).

Species of Aspergillus, in particular A. fumigatus and A. flavus, have been implicated as the causative agents of a number of diseases in humans and animals called aspergillosis (Bodey and Vartivarian, 1989). It has been repeatedly suggested that proteases are involved in virulence of A. fumigatus and A. flavus like there are many studies linking secreted proteases and virulence of bacteria. In fact, most human infections due to Aspergillus species are characterised by an extensive degradation of the parenchyma of the lung which is mainly composed of collagen and elastin (Campbell et al., 1994). Research has been focussed on the putative role of the secreted proteases in virulence of A. fumigatus and A. flavus which are the main human pathogens and are known to possess elastinolytic and collagenic activities (Kolattukudy et al., 1993). These elastinolytic activities were shown to correlate in vitro with infectivity in mice (Kothary et al., 1984). Two secreted proteases are known to be produced by A. fumigatus and A. flavus, an alkaline serine protease (ALP) and a neutral metallo protease (MEP). In A. fumigatus both the genes encoding these proteases were isolated, characterised and disrupted (Reicherd et al., 1990; Tang et al, 1992, 1993; Jaton-Ogay et al., 1994). However, alp mep double mutants showed no differences in pathogenecity when compared with wild type strains. Therefore, it must be concluded that the secreted A. fumigatus proteases identified in vitro are not essential factors for the invasion of tissue (Jaton Ogay et al., 1994). Although A. fumigatus accounts for only a small proportion of the airborne mould spores, it is the most frequently isolated fungus from lung and sputem (Schmitt et al., 1991). Other explanations for the virulence of the fungus could be that the conditions in the bronchia (temperature and nutrients) are favourable for the parasitic growth of A. fumigatus. As a consequence, invasive apergillosis could be a circumstancial event, when the host pathogenic defences have been weakened by immunosuppressive treatments or diseases like AIDS.

Four major classes of proteases are known and are designated by the principal functional groups in their active site: the ‘serine’, the ‘thiol’ or ‘cysteine’, the ‘aspartic’ or ‘carboxyl’ and the ‘metallo’ proteases. A detailed state of the art review on these major classes of proteases, minor classes and unclassified proteases can be found in Methods in Enzymology part 244 and 248 (A. J. Barrett ed, 1994 and 1995).

Specificity of Proteases

Apart from the catalytic machinery of proteases another important aspect of proteolytic enzymes is the specificity of proteases. The specificity of a protease indicates which substrates the protease is likely to hydrolyze. The twenty natural amino acids offer a large number of possibilities to make up peptides. Eg with twenty amino acids one can make up already 400 dipeptides and 800 different tripeptide, and so on. With longer peptides the number of possibilities will become almost unlimited. Certain proteases hydrolyze only particular sequences at a very specific position. The interaction of the protease with the peptide substrate may encompass one up to ten amino acid residues of the peptide substrate. With large proteinacious substrates there may be even more residues of the substrate that interact with the proteases. However this likely involves less specific interactions with protease residues outside the active site binding cleft. In general the specific recognition is restricted to the linear peptide, which is bound in the active site of the protease.

The nomenclature to describe the interaction of a substrate with a protease has been introduced in 1967 by Schechter and Berger (Biochem. Biophys. Res. Corn., 1967, 27, 157-162) and is now widely used in the literature. In this system, it is considered that the amino acid residues of the polypeptide substrate bind to so-called sub-sites in the active site. By convention, these sub-sites on the protease are called S (for sub-sites) and the corresponding amino acid residues are called P (for peptide). The amino acid residues of the N-terminal side of the scissile bond are numbered P3, P2, P1 and those residues of the C-terminal side are numbered P1′, P2′, P3′. The P1 or P1′ residues are the amino acid residues located near the scissile bond. The substrate residues around the cleavage site can then be numbered up to P8. The corresponding sub-sites on the protease that complement the substrate binding residues are numbered S3, S2, S1, S1′, S2′, S3′, etc, etc. The preferences of the sub-sites in the peptide binding site determine the preference of the protease for cleaving certain specific amino acid sequences at a particular spot. The amino acid sequence of the substrate should conform with the preferences exhibited by the sub-sites. The specificity towards a certain substrate is clearly dependant both on the binding affinity for the substrate and on the velocity at which subsequently the scissile bond is hydrolysed. Therefore the specificity of a protease for a certain substrate is usually indicated by its kcat/Km ratio, better known as the specificity constant. In this specificity constant kcat represents the turn-over rate and Km is the dissociation constant.

Apart from amino acid residues involved in catalysis and binding, proteases contain many other essential amino acid residues. Some residues are critical in folding, some residues maintain the overall three dimensional architecture of the protease, some residues may be involved in regulation of the proteolytic activity and some residue may target the protease for a particular location. Many proteases contain outside the active site one or more binding sites for metal ions. These metal ions often play a role in stabilizing the structure. In addition secreted eukaryotic microbial proteases may be extensively glycosylated. Both N- and O-linked glycosylation occurs. Glycosylation may aid protein folding, may increase solubility, prevent aggregation and as such stabilize the mature protein. In addition the extent of glycosylation may influence secretion as well as water binding by the protein.

Regulation of Proteolytic Activity

A substantial number of proteases are subject to extensive regulation of the proteolytic activity in order to avoid undesired proteolytic damage. To a certain extent this regulation takes place at transcription level. For example in fungi the transcription of secreted protease genes appears to be sensitive to external carbon and nitrogen sources, whereas genes encoding intracellular proteases are insensitive. The extracellular pH is sensed by fungi and some genes are regulated by pH. In this process transcriptional regulator proteins play a crucial role. Proteolytic processing of such regulator proteins is often the switch that turns the regulator proteins either on or off.

Proteases are subject to intra- as well as intermolecular regulation. This implies certain amino acids in the proteolytic enzyme molecule that are essential for such regulation. Proteases are typically synthesized as larger precursors known as zymogens, which are catalytically inactive. Usually the peptide chain extension rendering the precursor protease inactive is located at the amino terminus of the protease. The precursor is better known as pro-protein. As many of the proteases processed in this way are secreted from the cells they contain in addition a signal sequence (pre sequence) so that the complete precursor is synthesized as a pre-pro-protein. Apart from rendering the protease inactive the pro-peptide often is essential for mediating productive folding. Examples of proteases include serine proteases (alpha lytic protease, subtilisin, aqualysin, prohormone convertase), thiol proteases (cathepsin L and cruzian), aspartic proteases (proteinase A and cathepsin D) and metalloproteases. In addition the pro-peptide might play a role in cellular transport either alone or in conjunction with signal peptides. It may facilitate interaction with cellular chaperones or it may facilitate transport over the membrane. The size of the extension in the precursor pre-pro-protein may vary substantially, ranging from a short peptide fragment to a polypeptide, which can exist as an autonomous folding unit. In particular these larger extensions are often observed to be strong inhibitors of the protease even after cleavage from the protease. It was observed that even after cleavage such pro-peptides could assist in proper folding of the proteases. As such pro-peptides can be considered to function as molecular chaperones and separate or additional co-expression of such pro-peptides could be advantageous for protease production.

There is substantial difference in the level of regulation between proteases that are secreted into the medium and proteases that remain intracellular. Proteases secreted into the medium are usually after activation no longer subject to control and therefore are usually relatively simple in their molecular architecture consisting of one globular module. Intracellular proteases are necessarily subject to continuous control in order to avoid damage to the cells. In contrast with zymogens of secreted proteases in more complex regulatory proteases very large polypeptide segments may be inserted between the signal and the zymogen activation domain of the proteolytic module. Structure-function studies indicate that such non-protease parts may be involved in interactions with macroscopic structures, membranes, cofactors, substrates, effectors, inhibitors, ions, that regulate activity and activation of the proteolytic module(s) or its (their) zymogens. The non-proteolytic modules exhibit remarkable variation in size and structure. Many of the modules can exist as such independently from the proteolytic module. Therefore such modules can be considered to correspond to independent structural and functional units that are autonomous with respect to folding. The value of such a modular organization is that acquisition of new modules can endow the recipient protease with new novel binding specificities and can lead to dramatic changes in its activity, regulation and targeting. The principle of modular organized proteolytic enzymes may also be exploited by applying molecular biology tools in order to create novel interactions, regulation, specificity, and/or targeting by shuffling of modules. Although in general such additional modules are observed as N or C terminal extension, also large insertions within the exterior loops of the catalytic domain have been observed. It is believed that also in this case the principal fold of the protease represents still the essential topology to form a functional proteolytic entity and that the insertion can be regarded as substructure folded onto the surface of the proteolytic module.

Molecular Structure

In principle the modular organization of larger proteins is a general theme in nature. In particular within the larger multimodular frameworks typical proteolytic modules show sizes of 100 to 400 amino acids on the average. This corresponds with the average size of most of the globular proteolytic enzymes that are secreted into the medium. As discussed above polypeptide modules are polypeptide fragments, which can fold and function as independent entities. Another term for such modules is domains. However domain is used in a broader context than module. The term domain as used herein refers usually to a part of the polypeptide chain that depicts in the three-dimensional structure a typical folding topology. In a protein domains interact to varying extents, but less extensively than do the structural elements within domains. Other terms such as subdomain and folding unit are also used in literature. As such it is observed that many proteins that share a particular functionality may share the same domains. Such domains can be recognized from the primary structure that may show certain sequence patterns, which are typical for a particular domain. Typical examples are the mononucleotide binding fold, cellulose binding domains, helix-turn-helix DNA binding motif, zinc fingers, EF hands, membrane anchors. Modules refer to those domains which are expected to be able to fold and function autonomously. A person skilled in the art knows how to identify particular domains in a primary structure by applying commonly available computer software to said structure and homologous sequences from other organisms or species.

Although multimodular or multidomain proteins may appear as a string of beads, assemblies of substantial more complex architecture have been observed. In case the various beads reside on the same polypeptide chain the beads are generally called modules or domains. When the beads do not reside on one and same polypeptide chain but form assemblies via non-covalent interactions then the term subunit is used to designate the bead. Subunits may be transcribed by one and the same gene or by different genes. The multi-modular protein may become proteolytically processed after transcription leading to multiple subunits. Individual subunits may consist of multiple domains. Typically the smaller globular proteins of 100-300 amino acids usually consist only of one domain.

Molecular Classification of Proteolytic Enzymes

In general proteases are classified according to their molecular properties or according to their functional properties. The molecular classification is based on the primary structure of the protease. The primary structure of a protein represents its amino acid sequence, which can be derived from the nucleotide sequence of the corresponding gene. Tracing extensively the similarities in the primary structures may allow for the notice of similarities in catalytic mechanism and other properties, which even may extend to functional properties. The term family is used to describe a group of proteases that show evolutionary relationship based on similarity between their primary structures. The members of such a family are believed to have arisen by divergent evolution from the same ancestor. Within a family further sub-grouping of the primary structures based on more detailed refinement of sequence comparisons results in subfamilies. Classification according to three-dimensional fold of the proteases may comprise secondary structure, tertiary structure and quarternary structure. In general the classification on secondary structure is limited to content and gross orientation of secondary structure elements. Similarities in tertiary structure have led to the recognition of superfamilies or clans. A superfamily or a clan is a group of families that are thought to have common ancestry as they show a common 3-dimensional fold. In general tertiary structure is more conserved than the primary structure. As a consequence similarity of the primary structure does not always reflect similar functional properties. In fact functional properties may have diverged substantially resulting in interesting new properties. At present quarternary structure has not been applied to classify various proteases. This might be due to a certain bias of the structural databases towards simple globular proteases. Many proteolytic systems that are subject to activation, regulation, or complex reaction cascades are likely to consist of multiple domains or subunits. General themes in the structural organization of such protease systems may lead to new types of classification.

Classification According to Specificity.

In absence of sequence information proteases haven been subject to various type of functional classification. The classification and naming of enzymes by reference to the reactions which are catalyzed is a general principle in enzyme nomenclature. This approach is also the underlying principle of the EC numbering of enzymes (Enzyme Nomenclature 1992 Academic Press, Orlando). Two types of proteases (EC 3.4) can be recognized within Enzyme Nomenclature 1992, those of the exo-peptidases (EC 3.4.11-19) and those of the endo-peptidases (EC 3.4.21-24, 3.4.99). Endo-peptidases cleave peptide bonds in the inner regions of the peptide chain, away from the termini. Exo-peptidases cleave only residues from the ends of the peptide chain. The exo-peptidases acting at the free N-terminus may liberate a single amino acid residue, a dipeptide or a tripeptide and are called respectively amino peptidases (EC 3.4.11), dipeptidyl peptidases (EC 3.4.14) and tripeptidyl peptidase (EC 3.3.14). Proteases starting peptide processing from the carboxyl terminus liberating a single amino acid are called carboxy peptidase (EC 3.4.16-18). Peptidyl-dipeptidases (EC 3.4.15) remove a dipeptide from the carboxyl terminus. Exo- and endo-peptidase in one are the dipeptidases (EC 3.4.13), which cleave specifically only dipeptides in their two amino acid halves. Omega peptidases (EC 3.4.19) remove terminal residues that are either substituted, cyclic, or linked by isopeptide bonds

Apart from the position where the protease cleaves a peptide chain, for each type of protease a further division is possible based on the nature of the preferred amino acid residues in the substrate. In general one can distinguish proteases with broad, medium and narrow specificity. Some proteases are simply named after the specific proteins or polypeptides that they hydrolyze, e.g. keratinase, collagenase, elastase. A narrow specificity may pin down to one particular amino acid or one particular sequence which is removed or which is cleaved respectively. When the protease shows a particular preference for one aminoacid in the P1 or P1′ position the name of this amino acid may be a qualifier. For example prolyl amino peptidase removes proline from the amino terminus of a peptide (proline is the P1 residue). X-Pro or proline is used when the bond on the imino side of the proline is cleaved (proline is P1′ residue), eg proline carboxypeptidase removes proline from the carboxyl terminus. Prolyl endopeptidase (or Pro-X) cleaves behind proline while proline endopeptidase (X-Pro) cleaves in front of a proline. Amino acid residue in front of the scissile peptide bond refers to the amino acid residue that contributes the carboxyl group to the peptide bond. The amino acids residue behind the scissile peptide bond refers to the amino acid residue that contributes the amino group to the peptide bond. According to the general convention an amino acid chain runs from amino terminus (the start) to the carboxyl terminus (the end) and is numbered accordingly. Endo proteases may also show clear preference for a particular amino acid in the P1 or P1′ position, eg glycyl endopeptidase, peptidyl-lysine endopeptidase, glutamyl endopeptidase. In addition proteases may show a preference for a certain group of amino acids that share a certain resemblance. Such a group of preferred amino acids may comprise the hydrophobic amino acids, only the bulky hydrophobic amino acids, small hydrophobic, or just small amino acids, large positively charged amino acids, etc, etc. Apart from preferences for P1 and P1′ residues also particular preferences or exclusions may exist for residues preferred by other subsites on the protease. Such multiple preferences can result in proteases that are very specific for only those sequences that satisfy multiple binding requirements at the same time. In general it should be realized that protease are rather promiscuous enzymes. Even very specific protease may cleave peptides that do not comply with the generally observed preference of the protease. In addition it should be realized that environmental conditions such as pH, temperature, ionic strength, water activity, presence of solvents, presence of competing substrates or inhibitors may influence the preferences of the proteases. Environmental condition may not only influence the protease but also influence the way the proteinacious substrate is presented to the protease.

Classification by Catalytic Mechanism.

Proteases can be subdivided on the basis of their catalytic mechanism. It should be understood that for each catalytic mechanism the above classification based on specificity leads to further subdivision for each type of mechanism. Four major classes of proteases are known and are designated by the principal functional group in the active site: the serine proteases (EC 3.4.21 endo peptidase, EC 3.4.16 carboxy peptidase), the thiol or cysteine proteases (EC 3.4.22 endo peptidase, EC 3.4.18 carboxy peptidase), the carboxyl or aspartic proteases (EC 3.4.23 endo peptidase) and metallo proteases (EC 3.4.24 endo peptidase, EC 3.4.18 carboxy peptidase). There are characteristic inhibitors of the members of each catalytic type of protease. These small inhibitors irreversibly modify an amino acid residue of the protease active site. For example, the serine protease are inactivated by Phenyl Methane Sulfonyl Fluoride (PMSF) and Diisopropyl Fluoro Phosphate (DFP), which react with the active Serine whereas the chloromethylketone derivatives react with the Histidine of the catalytic triad. Phosphoramidon and 1,10 Phenanthroline typically inhibit metallo proteases. Inhibition by Pepstatin generally indicates an aspartic protease. E64 inhibits thiol protease specifically. Amastatin and Bestatin inhibit various aminopeptidases. Substantial variations in susceptibility of the proteases to the inhibitors are observed, even within one catalytic class. To a certain extent this might be related to the specificity of the protease. In case binding site architecture prevents a mechanism based inhibitor to approach the catalytic site, then such a protease escapes from inhibition and identification of the type of mechanism based on inhibition is prohibited. Chymostation for example is a potent inhibitor for serine protease with chymotrypsin like specificity, Elastatinal inhibits elastase like serine proteases and does not react with trypsin or chymostrypsin, 4 amido PMSF (APMSF) inhibits only serine proteases with trypsin like specificity. Extensive accounts of the use of inhibitors in the classification of proteases include Barret and Salvesen, Proteinase Inhibitors, Elsevier Amstardam, 1986; Bond and Beynon (eds), Proteolytic Enzymes, A Practical Approach, IRL Press, Oxford, 1989; Methods in Enzymology, eds E. J. Barret, volume 244, 1994 and volume 248, 1995; E. Shaw, Cysteinyl proteinases and their selective inactivation, Adv Enzymol. 63:271-347 (1990)

Classification According to Optimal Performance Conditions.

The catalytic mechanism of a proteases and the requirement for its conformational integrity determine mainly the conditions under which the protease can be utilized. Finding the protease that performs optimal under application conditions is a major challenge. Often conditions at which proteases have to perform are not optimal and do represent a compromise between the ideal conditions for a particular application and the conditions which would suit the protease best. Apart from the particular properties of the protease it should be realized that also the presentation of a proteinacious substrates is dependant on the conditions, and as such determines also which conditions are most effective for proteolysis. Specifications for the enzyme that are relevant for application comprise for example the pH dependence, the temperature dependence, sensitivity for or the dependence of metal ions, ionic strength, salt concentration, solvent compatibility. Another factor of major importance is the specific activity of a protease. The higher the enzyme's specific activity, the less enzyme is needed for a specific conversion. Lower enzyme requirements imply lower costs and lower protein contamination levels.

The pH is a major parameter that determines protease performance in an application. Therefor pH dependence is an important parameter to group proteases. The major groups that are recognized are the acid proteases, the neutral proteases, the alkaline proteases and the high alkaline proteases. The optimum pH matches only to some extent the proteolytic mechanism, eg aspartic protease show often an optimum at acidic pH, metalloproteases and thiol proteases often perform optimal around neutral pH to slightly alkaline, serine peptidases are mainly active in the alkaline and high alkaline region. For each class exceptions are known. In addition the overall water activity of the system plays a role. The pH optimum of a protease is defined as the pH range where the protease exhibits an optimal hydrolysis rate for the majority of its substrates in a particular environment under particular conditions. This range can be narrow, e.g. one pH unit, as well as quite broad, 3-4 pH units. In general the pH optimum is also dependant on the nature of the proteinacious substrate. Both the turnover rate as well as the specificity may vary as a function of pH. For a certain efficacy it can be desirable to use the protease far from its pH optimum because production of less desired peptides is avoided. Less desired peptides might be for example very short peptides or peptides causing a bitter taste. In addition a more narrow specificity can be a reason to choose conditions that deviate from optimal conditions with respect to turnover rate. Dependant on the pH the specificity may be narrow, e.g. only cleaving the peptide chain in one particular position or before or after one particular amino acid, or broader, e.g. cleaving a chain at multiple positions or cleaving before or after more different types of amino acids. In fact the pH dependence might be an important tool to regulate the proteolytic activity in an application. In case the pH shifts during the process the proteolysis might cease spontaneously without the need for further treatment to inactivate the protease. In some cases the proteolysis itself may be the driver of the pH shift.

Very crucial for application of proteases is their handling and operating stability. As protease stability is strongly affected by the working temperature, stability is often also referred to as thermostability. In general the stability of a protease indicates how long a protease retains its proteolytic activity under particular conditions. Particular conditions may comprise fermentation conditions, conditions during isolation and down stream processing of the enzyme, storage conditions, formulation and operating or application conditions. In case particular conditions encompass elevated temperatures stability in general refers to thermostability. Apart from the general causes for enzyme inactivation such as chemical modification, unfolding, aggregation etc, main problem with proteases is that they are easy subject to autodegradation. Especially for the utilization of proteases the temperature optimum is a relevant criterion to group proteases. Although there are different definitions, economically the most useful definition is the temperature or the temperature range in which the protease is most productive in a certain application. Protease productivity is a function of both the stability and the turnover rate. Where elevated temperature in general will increase the turnover rate, rapid inactivation will counteract the increase in turnover rate and ultimately lead to low productivity. The conformational stability of the protease under a given process condition will determine its maximum operating temperature. The temperature at which the protease looses it active conformation, often indicated as unfolding or melting point, can be determined according various methods, for example NMR, Circular Dichroism Spectroscopy, Differential Scanning Calorimetry etc. For protease unfolding is usually accompanied by a tremendous increase in autodegradation rate.

In applications where low temperatures are required protease may be selected with emphasis on a high intrinsic activity at low to moderate temperature. As under such conditions inactivation is relatively slow, under these conditions activity might largely determine productivity. In processes where only during a short period protease activity is required, the stability of the protease might be used as a switch to turn the protease off. In such case more labile instead of very thermostable protease might be preferred.

Other environmental parameters which may play a role in selecting the appropriate protease may be its sensitivity to salts. The compatibility with metal ions which are found frequently at low concentrations in various natural materials can be crucial for certain applications. In particular with metallo proteases certain ions may replace the catalytic metal ion and reduce or even abolish activity completely. In some applications metal ions have to be added on purpose in order to prevent the washout of the metal ions coordinated to the protease. It is well known that for the sake of enzyme stability and life-time, calcium ions have to be supplied in order to prevent dissociation of protein bound calcium.

Most microorganisms show a certain tolerance with respect to adapting to changes in the environmental condition. As a consequence at least the proteolytic spectrum that the organism is able to produce are likely to show at least similar tolerances. Such a proteolyitic spectrum might be covered by many proteases covering together the hole spectrum or by only a few proteases of a broad spectrum. Taking into account the whole proteolytic spectrum of a microorganism it can be very important to take the location into account.

Cellular Localisation and Characterization of Proteolytic Processing and Degradation

From an industrial point of view the proteases which are excreted from the cell have specific advantages with respect to producibility at a large scale and stress tolerance as they have to survive without protection of the cell. The large group of cellular protease can be further subdivided in soluble and membrane bound. Membrane bound may comprise protease at the inside as well the outside of the membrane. Intracellular soluble protease may be subdivided further according to specific compartments of the cell where they do occur. As the cell shields the proteases to some extent from the environment and because the cell controls the conditions in the cell, intracellular protease might be more sensitive to large environmental changes and their optima might correlate better with the specific intacellualr conditions. Knowing the conditions of the cellular department where the protease resides might indicate their preferences. Where extracellular protease in general do not require any regulation any more once excreted from the cell, intracellular proteases are often subject to more complicated control and regulation.

With respect to the function of a particular protease, its localisation is often very important; for example, a lot of the vacuolar and periplasmic proteases are involved in protein degradation, while many of the membrane-bound proteases are important in protein processing (Suarez Rendueles and Wolf, 1988).

A comprehensive review on the biological properties and evolution of proteases has been published in van den Hombergh: Thesis Landbouwuniversiteit Wageningen: An analysis of the proteolytic system in Aspergillus in order to improve protein production ISBN 90-5485-545-2, which is hereby incorporated by reference herein.

The Protease Problem

An important reason for the interest in microbial proteases are protease related expression problems observed in several expression hosts used in bioprocess industry. The increasing use of heterologous hosts for the production of proteins, by recombinant DNA technology, has recently brought this problem into focus, since it seems that heterologous proteins are more prone to proteolysis (Archer et al., 1992; van den Hombergh et al., 1996b).

In S. cerevisiae, already in the early eighties the protease problem and the involvement of several proteases, thus complicating targetted gene disruption approaches to overcome this problem, was recognised. During secretion a protein is exposed to several proteolytic activities residing in the secretory pathway. Additionally, in a prototrophic microorganism as Aspergillus secreted proteins can be exposed to several extracellular proteolytic activities

The problem of degradation of heterologously expressed proteins is well documented in Aspergillus (van den Hombergh Thesis Landbouwuniversiteit Wageningen: An analysis of the proteolytic system in Aspergillus in order to improve protein production ISBN 90-5485-545-2) and has been reported in the expression of cow prochymosin, human interferon α-2 tPA, GMCSF, IL6, lactoferrin, chicken egg-white lysosyme, porcine pIA2, A. niger pectin lyase B, E. coli enterotoxin B and β-glucoronidase, and Erwinia carotovora pectate lyase 3.

The problem of proteolysis may be addressed at several stages in protein production. Bioprocess engineers may address the problem of proteolysis by downstream processing at low temperatures, by early separation of product and protease(s) or by use of protease inhibitors. These may all lead to successful reduction of the problem. However it is certainly not eliminated, because much of the degradation occurs in vivo during the production of the protein.

In understanding how proteolysis is controlled in the cell, a major question concerns the recognition mechanism by which proteolysis is triggered. Into what extent are proteolytically susceptable (heterologous) proteins recognised as aberrant because of misfolding or, if correctly folded, as ‘foreign’, because they do not posses features essential for stability which are specific to the host. Various types of stress can cause the overall proteolysis in a cell to increase significantly. Factors known to increase rate of proteolysis include nutrient starvation and various other types of stress (i.e. elevation of temperature, osmotic stress, toxic substances and expression of certain heterologous proteins). To deal with proteolysis-related expression problems in vivo, several approaches have been proven succesfull as will be discussed below. However, we have to keep in mind that true ‘non-proteolytic cells’ cannot exist, since proteolysis by intracellular proteases is involved in many essential metabolic and ‘housekeeping’ reactions. Reducing proteolysis will therefore always be a process in which the changed genetical background which results in decreased proteolytic has to be analysed for potential secundary effects which could lead to reduced protein production (e.g. reduced growth rate or sporulation).

Disruption of Proteases in Filamentous Fungal Expression Hosts

Berka and coworkers (1990) describe the cloning and disruption of the A. awamori pepA gene. More recently, three disrupted aspartyl proteases in A. niger have been described. Disruptants for both the major extracellular aspartyl proteases and the major vacuolar aspartyl protease were described. Double and triple disruptants were generated via recombination and tested for protease spectra and expression and secretion of the A. niger pectin lyase PELB protein, which is very susceptable to proteolytic degradation (van den Hombergh et al., 1995). Disruption of pepA and pepB resulted both in reduction of extracellular protease activities, 80% and 6%, respectively. In the ΔpepE disruptant also other (vacuolar) protease activities were severely affected caused by inactivating of the proteolytic cascade for other vacuolar proteases. Reduced extracellular activities correlated with reduced in vitro degradation of PELB and improved in vivo expression of pelB (van den Hombergh et al., 1996f).

Protease Deficient (prt) Mutants Filamentous Fungi

Several Aspergillus protease deficient mutants have been studied whether protein production is improved. Archer and coworkers describe the reduced proteolysis of Hen egg white lysozyme in supernatants of an A. niger double prt mutant generated by Mattern and coworkers (1992) and conclude that although the degradation is not absent, it is significantly reduced. Van den Hombergh et al. (1995) show that the in vitro degradation of A. niger PELB is reduced in all seven prt complementation groups they have isolated. Virtually no degradation is observed in the prtB, prtF and prtG mutants. Recently, the expression of the pelB gene was shown to be improved in six complementation groups tested (prtA-F) and highest expression levels were observed in the prtB, prtF and prtG mutants. In addition to the single mutants, which contained residual extracellular proteolytic activities varying from 2-80% compared to wild type activity, double mutants were generated both by recombination and by additional rounds of mutagenesis. Via this approach several double prt mutants were selected and further characterised, which showed a further reduction of PELB degradation compared to their parental strains.

Instead of elimination of protease activities via disruption or mutagenesis, reduced proteolysis can also be achieved via down-regulation of the interfering proteolytic activities. This may be achieved by genetically altering the promoter or other regulatory sequences of the gene. As shown by Fraissinet-Tachet and coworkers (1996) the extracellular proteases in A. niger are all regulated by carbon catabolite repression and nitrogen metabolite repression. Nutrient starvation also causes the overall proteolysis rate in a cell to increase stromgly, which makes sense for a cell that lacks nutrients but posses proteins, that under starvation conditions are not needed or needed only in smaller amounts. In expression strategies which allow high expression on media containing high glucose and ammonium concentrations reduced proteolysis has been reported. Several constitutive glycolytic promoters (gpd and pkiA) are highly expressed under these conditions and can also be used to drive (heterologous) gene expression in continuous fermentations. The type of nutrient starvation imposed can influence different proteases to varying extent, which means that the importance of nutrient conditions in a given process depend on the type of proteolysis that is involved. Specific proteolysis may therefore be induced by conditions of substrate limitation which are frequently used in many large-scale fermentation processes.

The protease problem can nowadays be addressed in part by one or more of the above strategies. However, the residual proteolytic activity of yet unidentified proteolytic enzymes still constitutes a major problem in the art. In order to further reduce the level of unwanted proteolysis, there is a great need in the art to identify novel proteases responsible for degradation of homologously and heterologously expressed proteins. This invention provides such novel protease gene sequences encoding novel proteases. Once the primary sequence of a novel protease gene is known, one or more of the above recombinant DNA strategies may be employed to produce (knock-out) mutants with reduced proteolytic activity.

Despite the widespread applications of proteases in a great number of industrial processes, current enzymes also have significant shortcomings with respect to at least one of the following properties.

When added to animal feed, current proteases are not sufficiently resistant to digestive enzymes present in the gastrointestinal (GI) tract of e.g. pigs and poultry.

With respect to another aspect, the currently available enzymes are not sufficiently resistant to specific (high) temperatures and (high) pressure conditions that are applied during extrusion or pelleting operations.

Also, the current enzymes are not sufficiently active in a pH range of 3-7, conditions prevailing in many food, beverage products as well as in the GI tract of most animals.

According to yet another aspect the specificity of the currently available proteases is very limited which results in the inability of the existing enzymes to degrade or to dissolve certain “protease resistant” proteins thus resulting in low peptide or amino acid yields. Moreover proteases with new specificities allow the synthesis of new peptides.

Yet another drawback of the currently available enzymes is their low specific activity.

It is therefore clear that for a large number of applications a strong desire exists for proteases that are more resistant to digestive enzymes, high temperature and/or pressure and which exhibit novel specificities regarding their sites of hydrolysis. The present invention provides such enzymes.

OBJECT OF THE INVENTION

It is an object of the invention to provide novel polynucleotides encoding novel proteases. A further object is to provide naturally and recombinantly produced proteases as well as recombinant strains producing these. Such strains may also be used to produce classical fermentation products faster or with higher yields. Yet another object of the invention is to provide a filamentous fungus strain defective in producing a protease according to the invention. Such strains may be used for a more efficient production of heterologous or homologous proteins. Also antibodies and fusion polypeptides are part of the invention as well as methods of making and using the polynucleotides and polypeptides according to the invention.

SUMMARY OF THE INVENTION

The invention provides for novel polynucleotides encoding novel proteases.

More in particular, the invention provides for polynucleotides having a nucleotide sequence that hybridises (preferably under highly stringent conditions) to a sequence according to a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or to a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114. Consequently, the invention provides nucleic acids that are about 60%, preferably 65%, more preferably 70%, even more preferably 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% homologous to the sequences according to a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114.

In a more preferred embodiment the invention provides for such an isolated polynucleotide obtainable from a filamentous fungus, preferably Aspergilli, in particular A. niger is preferred.

In one embodiment, the invention provides for an isolated polynucleotide comprising a nucleic acid sequence encoding a polypeptide with an amino acid sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171 or functional equivalents thereof.

In a further preferred embodiment, the invention provides an isolated polynucleotide encoding at least one functional domain of a polypeptide according to a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171 or functional equivalents thereof.

In a preferred embodiment the invention provides a protease gene according to a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57. In another aspect the invention provides a polynucleotide, preferably a cDNA encoding an A. niger protease selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171 or variants or fragments of that polypeptide. In a preferred embodiment the cDNA has a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114 or functional equivalents thereof.

A genomic clone encoding a polypeptide according to the invention may also be obtained by selecting suitable probes to specifically amplify a genomic region corresponding to any of the sequences according to SEQ ID NO: 1 to SEQ ID NO: 57 or fragments thereof, hybridising that probe under suitable conditions to genomic DNA obtained from a suitable organism, such as Aspergillus, e.g. A. niger, amplifying the desired fragment e.g. by PCR (polymerase chain reaction) followed by purifying and cloning of the amplified fragment.

In an even further preferred embodiment, the invention provides for a polynucleotide comprising the coding sequence of the genomic polynucleotides according to the invention, preferred is a polynucleotide sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114.

In another preferred embodiment, the invention provides a cDNA obtainable by cloning and expressing a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 into a suitable host organism, such as A. niger.

A polypeptide according to the invention may also be obtained by cloning and expressing a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 into a suitable host organism, such as A. niger.

The invention also relates to vectors comprising a polynucleotide sequence according to the invention and primers, probes and fragments that may be used to amplify or detect the DNA according to the invention.

In a further preferred embodiment, a vector is provided wherein the polynucleotide sequence according to the invention is functionally linked with regulatory sequences suitable for expression of the encoded amino acid sequence in a suitable host cell, such as A. niger or A. oryzea. The invention also provides methods for preparing polynucleotides and vectors according to the invention.

The invention also relates to recombinantly produced host cells that contain heterologous or homologous polynucleotides according to the invention.

In one embodiment, the invention provides recombinant host cells wherein the expression of a protease according to the invention is significantly reduced or wherein the activity of the protease is reduced or wherein the protease is even inactivated. Such recombinants are especially useful for the expression of homologous or heterologous proteins.

In another embodiment, the invention provides recombinant host cells wherein the expression of a protease according to the invention is significantly increased or wherein the activity of the protease is increased. Such recombinants are especially useful for the expression of homologous or heterologous proteins where maturation is seriously hampered in case the required proteolytic cleavage becomes the rate limiting step.

In another embodiment the invention provides for a recombinantly produced host cell that contains heterologous or homologous DNA according to the invention, preferably DNA encoding proteins bearing signal sequnences and wherein the cell is capable of producing a functional protease according to the invention, preferably a cell capable of over-expressing the protease according to the invention, for example an Aspergillus strain comprising an increased copy number of a gene or cDNA according to the invention.

In another embodiment the invention provides for a recombinantly produced host cell that contains heterologous or homologous DNA according to the invention and wherein the cell is capable of secreting a functional protease according to the invention, preferably a cell capable of over-expressing and secreting the protease according to the invention, for example an Aspergillus strain comprising an increased copy number of a gene or cDNA according to the invention.

In yet another aspect of the invention, a purified polypeptide is provided. The polypeptides according to the invention include the polypeptides encoded by the polynucleotides according to the invention. Especially preferred is a polypeptide according to a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171 or functional equivalents thereof.

The invention also provides for antibodies reactive with a polypeptide according to the invention. These antibodies may be polyclonal, yet especially preferred are monoclonal antibodies. Such antibodies are particularly useful for purifying the polypeptides according to the invention.

Fusion proteins comprising a polypeptide according to the invention are also within the scope of the invention. The invention also provides methods of making the polypeptides according to the invention.

The invention further relates to a method for diagnosing aspergillosis either by detecting the presence of a polypeptide according to the invention or functional equivalents thereof, or by detecting the presence of a DNA according to the invention or fragments or functional equivalents thereof.

The invention also relates to the use of the protease according to the invention in an industrial process as described herein

DETAILED DESCRIPTION OF THE INVENTION

Polynucleotides

The present invention provides polynucleotides encoding proteases having an amino acid sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171 or functional equivalents thereof. The sequence of these genes was determined by sequencing a genomic clone obtained from Aspergillus niger. The invention provides polynucleotide sequences comprising the gene encoding these proteases as well as their complete cDNA sequence and its coding sequence. Accordingly, the invention relates to an isolated polynucleotide comprising a nucleotide sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114 or functional equivalents thereof.

More in particular, the invention relates to an isolated polynucleotide hybridisable under stringent conditions to a polynucleotide selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114 preferably under highly stringent conditions. Advantageously, such polynucleotides may be obtained from filamentous fungi, in particular from Aspergillus niger. More specifically, the invention relates to an isolated polynucleotide having a nucleotide sequence according to a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114.

The invention also relates to an isolated polynucleotide encoding at least one functional domain of a polypeptide according to a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171 or functional equivalents thereof.

As used herein, the terms “gene” and “recombinant gene” refer to nucleic acid molecules which may be isolated from chromosomal DNA, which include an open reading frame encoding a protein, e.g. an A. niger protease. A gene may include coding sequences, non-coding sequences, introns and regulatory sequences. Moreover, a gene refers to an isolated nucleic acid molecule as defined herein.

A nucleic acid molecule of the present invention, such as a nucleic acid molecule having the nucleotide sequence of a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114 or a functional equivalent thereof, can be isolated using standard molecular biology techniques and the sequence information provided herein. For example, using all or portion of the nucleic acid sequence of a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or the nucleotide sequence of a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114 as a hybridization probe, nucleic acid molecules according to the invention can be isolated using standard hybridization and cloning techniques (e. g., as described in Sambrook, J., Fritsh, E. F., and Maniatis, T. Molecular Cloning: A Laboratory Manual. 2nd, ed., Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989).

Moreover, a nucleic acid molecule encompassing all or a portion of a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114 can be isolated by the polymerase chain reaction (PCR) using synthetic oligonucleotide primers designed based upon the sequence information contained in a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114.

A nucleic acid of the invention can be amplified using cDNA, mRNA or alternatively, genomic DNA, as a template and appropriate oligonucleotide primers according to standard PCR amplification techniques. The nucleic acid so amplified can be cloned into an appropriate vector and characterized by DNA sequence analysis.

Furthermore, oligonucleotides corresponding to or hybridisable to nucleotide sequences according to the invention can be prepared by standard synthetic techniques, e.g., using an automated DNA synthesizer.

In a preferred embodiment, an isolated nucleic acid molecule of the invention comprises the nucleotide sequence shown in a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114. The sequence of a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114 corresponds to the coding region of the A. niger protease cDNA. This cDNA comprises sequences encoding the A. niger protease polypeptide according to a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171.

In another preferred embodiment, an isolated nucleic acid molecule of the invention comprises a nucleic acid molecule which is a complement of the nucleotide sequence shown in a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114 or a functional equivalent of these nucleotide sequences.

A nucleic acid molecule which is complementary to another nucleotide sequence is one which is sufficiently complementary to the other nucleotide sequence such that it can hybridize to the other nucleotide sequence thereby forming a stable duplex.

One aspect of the invention pertains to isolated nucleic acid molecules that encode a polypeptide of the invention or a functional equivalent thereof such as a biologically active fragment or domain, as well as nucleic acid molecules sufficient for use as hybridisation probes to identify nucleic acid molecules encoding a polypeptide of the invention and fragments of such nucleic acid molecules suitable for use as PCR primers for the amplification or mutation of nucleic acid molecules.

An “isolated polynucleotide” or “isolated nucleic acid” is a DNA or RNA that is not immediately contiguous with both of the coding sequences with which it is immediately contiguous (one on the 5′ end and one on the 3′ end) in the naturally occurring genome of the organism from which it is derived. Thus, in one embodiment, an isolated nucleic acid includes some or all of the 5′ non-coding (e.g., promotor) sequences that are immediately contiguous to the coding sequence. The term therefore includes, for example, a recombinant DNA that is incorporated into a vector, into an autonomously replicating plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a cDNA or a genomic DNA fragment produced by PCR or restriction endonuclease treatment) independent of other sequences. It also includes a recombinant DNA that is part of a hybrid gene encoding an additional polypeptide that is substantially free of cellular material, viral material, or culture medium (when produced by recombinant DNA techniques), or chemical precursors or other chemicals (when chemically synthesized). Moreover, an “isolated nucleic acid fragment” is a nucleic acid fragment that is not naturally occurring as a fragment and would not be found in the natural state.

As used herein, the terms “polynucleotide” or “nucleic acid molecule” are intended to include DNA molecules (e.g., cDNA or genomic DNA) and RNA molecules (e.g., mRNA) and analogs of the DNA or RNA generated using nucleotide analogs. The nucleic acid molecule can be single-stranded or double-stranded, but preferably is double-stranded DNA. The nucleic acid may be synthesized using oligonucleotide analogs or derivatives (e.g., inosine or phosphorothioate nucleotides). Such oligonucleotides can be used, for example, to prepare nucleic acids that have altered base-pairing abilities or increased resistance to nucleases.

Another embodiment of the invention provides an isolated nucleic acid molecule which is antisense to a protease nucleic acid molecule, e.g., the coding strand of a protease nucleic acid molecule. Also included within the scope of the invention are the complement strands of the nucleic acid molecules described herein.

Sequencing Errors

The sequence information as provided herein should not be so narrowly construed as to require inclusion of erroneously identified bases. The specific sequences disclosed herein can be readily used to isolate the complete gene from filamentous fungi, in particular A. niger which in turn can easily be subjected to further sequence analyses thereby identifying sequencing errors.

Unless otherwise indicated, all nucleotide sequences determined by sequencing a DNA molecule herein were determined using an automated DNA sequencer and all amino acid sequences of polypeptides encoded by DNA molecules determined herein were predicted by translation of a DNA sequence determined as above. Therefore, as is known in the art for any DNA sequence determined by this automated approach, any nucleotide sequence determined herein may contain some errors. Nucleotide sequences determined by automation are typically at least about 90% identical, more typically at least about 95% to at least about 99.9% identical to the actual nucleotide sequence of the sequenced DNA molecule. The actual sequence can be more precisely determined by other approaches including manual DNA sequencing methods well known in the art. As is also known in the art, a single insertion or deletion in a determined nucleotide sequence compared to the actual sequence will cause a frame shift in translation of the nucleotide sequence such that the predicted amino acid sequence encoded by a determined nucleotide sequence will be completely different from the amino acid sequence actually encoded by the sequenced DNA molecule, beginning at the point of such an insertion or deletion.

The person skilled in the art is capable of identifying such erroneously identified bases and knows how to correct for such errors.

Nucleic Acid Fragments, Probes and Primers

A nucleic acid molecule according to the invention may comprise only a portion or a fragment of the nucleic acid sequence shown in a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114, for example a fragment which can be used as a probe or primer or a fragment encoding a portion of a protease protein. The nucleotide sequence determined from the cloning of the protease gene and cDNA allows for the generation of probes and primers designed for use in identifying and/or cloning other protease family members, as well as protease homologues from other species. The probe/primer typically comprises substantially purified oligonucleotide which typically comprises a region of nucleotide sequence that hybridizes preferably under highly stringent conditions to at least about 12 or 15, preferably about 18 or 20, preferably about 22 or 25, more preferably about 30, 35, 40, 45, 50, 55, 60, 65, or 75 or more consecutive nucleotides of a nucleotide sequence shown in a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114 or of a functional equivalent thereof.

Probes based on the protease nucleotide sequences can be used to detect transcripts or genomic protease sequences encoding the same or homologous proteins for instance in other organisms. In preferred embodiments, the probe further comprises a label group attached thereto, e.g., the label group can be a radioisotope, a fluorescent compound, an enzyme, or an enzyme cofactor. Such probes can also be used as part of a diagnostic test kit for identifying cells which express a protease protein.

Identity & Homology

The terms “homology” or “percent identity” are used interchangeably herein. For the purpose of this invention, it is defined here that in order to determine the percent identity of two amino acid sequences or of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in the sequence of a first amino acid or nucleic acid sequence for optimal alignment with a second amino or nucleic acid sequence). The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences (i.e., % identity=number of identical positions/total number of positions (i.e. overlapping positions)×100).

Preferably, the two sequences are the same length.

The skilled person will be aware of the fact that several different computer programs are available to determine the homology between two sequences. For instance, a comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm. In a preferred embodiment, the percent identity between two amino acid sequences is determined using the Needleman and Wunsch (J. Mol. Biol. (48):444-453 (1970)) algorithm which has been incorporated into the GAP program in the GCG software package (available at http://www.gcg.com), using either a Blossom 62 matrix or a PAM250 matrix, and a gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3, 4, 5, or 6. The skilled person will appreciate that all these different parameters will yield slightly different results but that the overall percentage identity of two sequences is not significantly altered when using different algorithms.

In yet another embodiment, the percent identity between two nucleotide sequences is determined using the GAP program in the GCG software package (available at http://www.gcg.com), using a NWSgapdna.CMP matrix and a gap weight of 40, 50, 60, 70, or 80 and a length weight of 1, 2, 3, 4, 5, or 6. In another embodiment, the percent identity two amino acid or nucleotide sequence is determined using the algorithm of E. Meyers and W. Miller (CABIOS, 4:11-17 (1989) which has been incorporated into the ALIGN program (version 2.0) (available at http://vega/igh.cnrs.fr/bin/align-guess.cgi), using a PAM120 weight residue table, a gap length penalty of 12 and a gap penalty of 4.

The nucleic acid and protein sequences of the present invention can further be used as a “query sequence” to perform a search against public databases to, for example, identify other family members or related sequences. Such searches can be performed using the NBLAST and XBLAST programs (version 2.0) of Altschul, et al. (1990) J. Mol. Biol. 215:403-10. BLAST nucleotide searches can be performed with the NBLAST program, score=100, wordlength=12 to obtain nucleotide sequences homologous to protease nucleic acid molecules of the invention. BLAST protein searches can be performed with the XBLAST program, score=50, wordlength=3 to obtain amino acid sequences homologous to protease protein molecules of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et al., (1997) Nucleic Acids Res. 25(17):3389-3402. When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs (e.g., XBLAST and NBLAST) can be used. See http://www.ncbi.nlm.nih.gov.

Hybridisation

As used herein, the term “hybridizing” is intended to describe conditions for hybridization and washing under which nucleotide sequences at least about 50%, at least about 60%, at least about 70%, more preferably at least about 80%, even more preferably at least about 85% to 90%, more preferably at least 95% homologous to each other typically remain hybridized to each other.

A preferred, non-limiting example of such hybridization conditions are hybridization in 6× sodium chloride/sodium citrate (SSC) at about 45° C., followed by one or more washes in 1×SSC, 0.1% SDS at 50° C., preferably at 55° C., preferably at 60° C. and even more preferably at 65° C.

Highly stringent conditions include, for example, hybridizing at 68° C. in 5×SSC/5× Denhardt's solution/1.0% SDS and washing in 0.2×SSC/0.1% SDS at room temperature. Alternatively washing may be performed at 42° C.

The skilled artisan will know which conditions to apply for stringent and highly stringent hybridisation conditions. Additional guidance regarding such conditions is readily available in the art, for example, in Sambrook et al., 1989, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, N.Y.; and Ausubel et al. (eds.), 1995, Current Protocols in Molecular Biology, (John Wiley & Sons, N.Y.).

Of course, a polynucleotide which hybridizes only to a poly A sequence (such as the 3′ terminal poly(A) tract of mRNAs), or to a complementary stretch of T (or U) resides, would not be included in a polynucleotide of the invention used to specifically hybridize to a portion of a nucleic acid of the invention, since such a polynucleotide would hybridize to any nucleic acid molecule contain a poly (A) stretch or the complement thereof (e.g., practically any double-standed cDNA clone).

Obtaining Full Length DNA from Other Organisms

In a typical approach, cDNA libraries constructed from other organisms, e.g. filamentous fungi, in particular from the species Aspergillus can be screened.

For example, Aspergillus strains can be screened for homologous protease polynucleotides by Northern blot analysis. Upon detection of transcripts homologous to polynucleotides according to the invention, cDNA libraries can be constructed from RNA isolated from the appropriate strain, utilizing standard techniques well known to those of skill in the art. Alternatively, a total genomic DNA library can be screened using a probe hybridisable to a protease polynucleotide according to the invention.

Homologous gene sequences can be isolated, for example, by performing PCR using two oligonucleotide primers or two degenerate oligonucleotide primer pools designed on the basis of nucleotide sequences as taught herein.

The template for the reaction can be cDNA obtained by reverse transcription of mRNA prepared from strains known or suspected to express a polynucleotide according to the invention. The PCR product can be subcloned and sequenced to ensure that the amplified sequences represent the sequences of a new protease nucleic acid sequence, or a functional equivalent thereof.

The PCR fragment can then be used to isolate a full length cDNA clone by a variety of known methods. For example, the amplified fragment can be labeled and used to screen a bacteriophage or cosmid cDNA library. Alternatively, the labeled fragment can be used to screen a genomic library.

PCR technology also can be used to isolate full length cDNA sequences from other organisms. For example, RNA can be isolated, following standard procedures, from an appropriate cellular or tissue source. A reverse transcription reaction can be performed on the RNA using an oligonucleotide primer specific for the most 5′ end of the amplified fragment for the priming of first strand synthesis.

The resulting RNA/DNA hybrid can then be “tailed” (e.g., with guanines) using a standard terminal transferase reaction, the hybrid can be digested with RNase H, and second strand synthesis can then be primed (e.g., with a poly-C primer). Thus, cDNA sequences upstream of the amplified fragment can easily be isolated. For a review of useful cloning strategies, see e.g., Sambrook et al., supra; and Ausubel et al., supra.

Vectors

Another aspect of the invention pertains to vectors, preferably expression vectors, containing a nucleic acid encoding a protease protein or a functional equivalent thereof. As used herein, the term “vector” refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. One type of vector is a “plasmid”, which refers to a circular double stranded DNA loop into which additional DNA segments can be ligated. Another type of vector is a viral vector, wherein additional DNA segments can be ligated into the viral genome. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operatively linked. Such vectors are referred to herein as “expression vectors”. In general, expression vectors of utility in recombinant DNA techniques are often in the form of plasmids. The terms “plasmid” and “vector” can be used interchangeably herein as the plasmid is the most commonly used form of vector. However, the invention is intended to include such other forms of expression vectors, such as viral vectors (e.g., replication defective retroviruses, adenoviruses and adeno-associated viruses), which serve equivalent functions.

The recombinant expression vectors of the invention comprise a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, which means that the recombinant expression vector includes one or more regulatory sequences, selected on the basis of the host cells to be used for expression, which is operatively linked to the nucleic acid sequence to be expressed. Within a recombinant expression vector, “operatively linked” is intended to mean that the nucleotide sequence of interest is linked to the regulatory sequence(s) in a manner which allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell). The term “regulatory sequence” is intended to include promoters, enhancers and other expression control elements (e.g., polyadenylation signal). Such regulatory sequences are described, for example, in Goeddel; Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990). Regulatory sequences include those which direct constitutive expression of a nucleotide sequence in many types of host cells and those which direct expression of the nucleotide sequence only in a certain host cell (e.g. tissue-specific regulatory sequences). It will be appreciated by those skilled in the art that the design of the expression vector can depend on such factors as the choice of the host cell to be transformed, the level of expression of protein desired, etc. The expression vectors of the invention can be introduced into host cells to thereby produce proteins or peptides, encoded by nucleic acids as described herein (e.g. protease proteins, mutant forms of protease proteins, fragments, variants or functional equivalents thereof, fusion proteins, etc.).

The recombinant expression vectors of the invention can be designed for expression of protease proteins in prokaryotic or eukaryotic cells. For example, protease proteins can be expressed in bacterial cells such as E. coli, insect cells (using baculovirus expression vectors) yeast cells or mammalian cells. Suitable host cells are discussed further in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990). Alternatively, the recombinant expression vector can be transcribed and translated in vitro, for example using T7 promoter regulatory sequences and 17 polymerase.

Expression vectors useful in the present invention include chromosomal-, episomal- and virus-derived vectors e.g., vectors derived from bacterial plasmids, bacteriophage, yeast episome, yeast chromosomal elements, viruses such as baculoviruses, papova viruses, vaccinia viruses, adenoviruses, fowl pox viruses, pseudorabies viruses and retroviruses, and vectors derived from combinations thereof, such as those derived from plasmid and bacteriophage genetic elements, such as cosmids and phagemids.

The DNA insert should be operatively linked to an appropriate promoter, such as the phage lambda PL promoter, the E. coli lac, trp and tac promoters, the SV40 early and late promoters and promoters of retroviral LTRs, to name a few. Other suitable promoters will be known to the skilled person. In a specific embodiment, promoters are preferred that are capable of directing a high expression level of proteases in filamentous fungi. Such promoters are known in the art. The expression constructs may contain sites for transcription initiation, termination, and, in the transcribed region, a ribosome binding site for translation. The coding portion of the mature transcripts expressed by the constructs will include a translation initiating AUG at the beginning and a termination codon appropriately positioned at the end of the polypeptide to be translated.

Vector DNA can be introduced into prokaryotic or eukaryotic cells via conventional transformation or transfection techniques. As used herein, the terms “transformation” and “transfection” are intended to refer to a variety of art-recognized techniques for introducing foreign nucleic acid (e.g., DNA) into a host cell, including calcium phosphate or calcium chloride co-percipitation, DEAE-dextran-mediated transfection, transduction, infection, lipofection, cationic lipid mediated transfection or electroporation. Suitable methods for transforming or transfecting host cells can be found in Sambrook, et al. (Molecular Cloning: A Laboratory Manual, 2^(nd) , ed. Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989), Davis et al., Basic Methods in Molecular Biology (1986) and other laboratory manuals.

For stable transfection of mammalian cells, it is known that, depending upon the expression vector and transfection technique used, only a small fraction of cells may integrate the foreign DNA into their genome. In order to identify and select these integrants, a gene that encodes a selectable marker (e.g., resistance to antibiotics) is generally introduced into the host cells along with the gene of interest. Preferred selectable markers include those which confer resistance to drugs, such as G418, hygromycin and methatrexate. Nucleic acid encoding a selectable marker can be introduced into a host cell on the same vector as that encoding a protease protein or can be introduced on a separate vector. Cells stably transfected with the introduced nucleic acid can be identified by drug selection (e.g. cells that have incorporated the selectable marker gene will survive, while the other cells die).

Expression of proteins in prokaryotes is often carried out in E. coli with vectors containing constitutive or inducible promoters directing the expression of either fusion or non-fusion proteins. Fusion vectors add a number of amino acids to a protein encoded therein, e.g. to the amino terminus of the recombinant protein. Such fusion vectors typically serve three purposes: 1) to increase expression of recombinant protein; 2) to increase the solubility of the recombinant protein; and 3) to aid in the purification of the recombinant protein by acting as a ligand in affinity purification. Often, in fusion expression vectors, a proteolytic cleavage site is introduced at the junction of the fusion moiety and the recombinant protein to enable separation of the recombinant protein from the fusion moiety subsequent to purification of the fusion protein. Such enzymes, and their cognate recognation sequences, include Factor Xa, thrombin and enterokinase.

As indicated, the expression vectors will preferably contain selectable markers. Such markers include dihydrofolate reductase or neomycin resistance for eukarotic cell culture and tetracyline or ampicilling resistance for culturing in E. coli and other bacteria. Representative examples of appropriate host include bacterial cells, such as E. coli, Streptomyces and Salmonella typhimurium; fungal cells, such as yeast; insect cells such as Drosophila S2 and Spodoptera Sf9; animal cells such as CHO, COS and Bowes melanoma; and plant cells. Appropriate culture mediums and conditions for the above-described host cells are known in the art.

Among vectors preferred for use in bacteria are pQE70, pQE60 and PQE-9, available from Qiagen; pBS vectors, Phagescript vectors, Bluescript vectors, pNH8A, pNH16A, pNH18A, pNH46A, available from Stratagene; and ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 available from Pharmacia. Among preferred eukaryotic vectors are PWLNEO, pSV2CAT, pOG44, pZT1 and pSG available from Stratagene; and pSVK3, pBPV, pMSG and pSVL available from Pharmacia. Other suitable vectors will be readily apparent to the skilled artisan.

Among known bacterial promotors for use in the present invention include E. coli lacI and lacZ promoters, the T3 and T7 promoters, the gpt promoter, the lambda PR, PL promoters and the trp promoter, the HSV thymidine kinase promoter, the early and late SV40 promoters, the promoters of retroviral LTRs, such as those of the Rous sarcoma virus (“RSV”), and metallothionein promoters, such as the mouse metallothionein-I promoter.

Transcription of the DNA encoding the polypeptides of the present invention by higher eukaryotes may be increased by inserting an enhancer sequence into the vector. Enhancers are cis-acting elements of DNA, usually about from 10 to 300 bp that act to increase transcriptional activity of a promoter in a given host cell-type. Examples of enhancers include the SV40 enhancer, which is located on the late side of the replication origin at bp 100 to 270, the cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side of the replication origin, and adenovirus enhancers.

For secretion of the translated protein into the lumen of the endoplasmic reticulum, into the periplasmic space or into the extracellular environment, appropriate secretation signal may be incorporated into the expressed polypeptide. The signals may be endogenous to the polypeptide or they may be heterologous signals.

The polypeptide may be expressed in a modified form, such as a fusion protein, and may include not only secretion signals but also additional heterologous functional regions. Thus, for instance, a region of additional amino acids, particularly charged amino acids, may be added to the N-terminus of the polypeptide to improve stability and persistence in the host cell, during purification or during subsequent handling and storage. Also, peptide moieties may be added to the polypeptide to facilitate purification.

Polypeptides According to the Invention

The invention provides an isolated polypeptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171, an amino acid sequence obtainable by expressing a polynucleotide according to the invention or in a preferred embodiment of a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 in an appropriate host, as well as an amino acid sequence obtainable by expressing a polynucleotide sequences selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114 in an appropriate host. Also, a peptide or polypeptide comprising a functional equivalent of the above polypeptides is comprised within the present invention. The above polypeptides are collectively comprised in the term “polypeptides according to the invention”

The terms “peptide” and “oligopeptide” are considered synonymous (as is commonly recognized) and each term can be used interchangeably as the context requires to indicate a chain of at least two amino acids coupled by peptidyl linkages. The word “polypeptide” is used herein for chains containing more than seven amino acid residues. All oligopeptide and polypeptide formulas or sequences herein are written from left to right and in the direction from amino terminus to carboxy terminus. The one-letter code of amino acids used herein is commonly known in the art and can be found in Sambrook, et al. (Molecular Cloning: A Laboratory Manual, 2^(nd) , ed. Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989)

By “isolated” polypeptide or protein is intended a polypeptide or protein removed from its native environment. For example, recombinantly produced polypeptides and proteins expressed in host cells are considered isolated for purpose of the invention as are native or recombinant polypeptides which have been substantially purified by any suitable technique such as, for example, the single-step purification method disclosed in Smith and Johnson, Gene 67:31-40 (1988).

The protease according to the invention can be recovered and purified from recombinant cell cultures by well-known methods including ammonium sulfate or ethanol precipitation, acid extraction, anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite chromatography and lectin chromatography. For analytical purposes most preferably, high performance liquid chromatography (“HPLC”) is employed for purification.

Polypeptides of the present invention include naturally purified products, products of chemical synthetic procedures, and products produced by recombinant techniques from a prokaryotic or eukaryotic host, including, for example, bacterial, yeast, higher plant, insect and mammalian cells. Depending upon the host employed in a recombinant production procedure, the polypeptides of the present invention may be glycosylated or may be non-glycosylated. In addition, polypeptides of the invention may also include an initial modified methionine residue, in some cases as a result of host-mediated processes.

Moreover, a protein according to the invention may be a precursor protein such as a zymogen, a hybrid protein, a protein obtained as a pro sequence or pre-pro sequence, or any other type of immature form.

Protein Fragments

The invention also features biologically active fragments of the polypeptides according to the invention.

Biologically active fragments of a polypeptide of the invention include polypeptides comprising amino acid sequences sufficiently identical to or derived from the amino acid sequence of the protease protein (e.g., the amino acid sequence of a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171), which include fewer amino acids than the full length protein, and exhibit at least one biological activity of the corresponding full-length protein. Typically, biologically active fragments comprise a domain or motif with at least one activity of the protease protein. A biologically active fragment of a protein of the invention can be a polypeptide which is, for example, 10, 25, 50, 100 or more amino acids in length. Moreover, other biologically active portions, in which other regions of the protein are deleted, can be prepared by recombinant techniques and evaluated for one or more of the biological activities of the native form of a polypeptide of the invention.

The invention also features nucleic acid fragments which encode the above biologically active fragments of the protease protein.

Fusion Proteins

The proteins of the present invention or functional equivalents thereof, e.g., biologically active portions thereof, can be operatively linked to a non-protease polypeptide (e.g., heterologous amino acid sequences) to form fusion proteins. As used herein, a protease “chimeric protein” or “fusion protein” comprises a protease polypeptide operatively linked to a non-protease polypeptide. A “protease polypeptide” refers to a polypeptide having an amino acid sequence corresponding to a polypeptide sequence according to the invention, whereas a “non-protease polypeptide” refers to a polypeptide having an amino acid sequence corresponding to a protein which is not substantially homologous to a protein according to the invention, e.g., a protein which is different from the protease protein and which is derived from the same or a different organism. Within a protease fusion protein the protease polypeptide can correspond to all or a portion of a protein according to the invention. In a preferred embodiment, a protease fusion protein comprises at least one biologically active fragment of a protein according to the invention. In another preferred embodiment, a protease fusion protein comprises at least two biologically active portions of a protein according to the invention. Within the fusion protein, the term “operatively linked” is intended to indicate that the protease polypeptide and the non-protease polypeptide are fused in-frame to each other. The non-protease polypeptide can be fused to the N-terminus or C-terminus of the protease polypeptide.

For example, in one embodiment, the fusion protein is a GST-protease fusion protein in which the protease sequences are fused to the C-terminus of the GST sequences.

Such fusion proteins can facilitate the purification of recombinant protease. In another embodiment, the fusion protein is a protease protein containing a heterologous signal sequence at its N-terminus. In certain host cells (e.g., mammalian and Yeast host cells), expression and/or secretion of protease can be increased through use of a hetereologous signal sequence.

In another example, the gp67 secretory sequence of the baculovirus envelope protein can be used as a heterologous signal sequence (Current Protocols in Molecular Biology, Ausubel et al., eds., John Wiley & Sons, 1992). Other examples of eukaryotic heterologous signal sequences include the secretory sequences of melittin and human placental alkaline phosphatase (Stratagene; La Jolla, Calif.). In yet another example, useful prokarytic heterologous signal sequences include the phoA secretory signal (Sambrook et al., supra) and the protein A secretory signal (Pharmacia Biotech; Piscataway, N.J.).

A signal sequence can be used to facilitate secretion and isolation of a protein or polypeptide of the invention. Signal sequences are typically characterized by a core of hydrophobic amino acids which are generally cleaved from the mature protein during secretion in one or more cleavage events. Such signal peptides contain processing sites that allow cleavage of the signal sequence from the mature proteins as they pass through the secretory pathway. The signal sequence directs secretion of the protein, such as from a eukaryotic host into which the expression vector is transformed, and the signal sequence is subsequently or concurrently cleaved. The protein can then be readily purified from the extracellular medium by art recognized methods. Alternatively, the signal sequence can be linked to the protein of interest using a sequence which facilitates purification, such as with a GST domain. Thus, for instance, the sequence encoding the polypeptide may be fused to a marker sequence, such as a sequence encoding a peptide, which facilitates purification of the fused polypeptide. In certain preferred embodiments of this aspect of the invention, the marker sequence is a hexa-histidine peptide, such as the tag provided in a pOE vector (Qiagen, Inc.), among others, many of which are commercially available. As described in Gentz et al, Proc. Natl. Acad. Sci. USA 86:821-824 (1989), for instance, hexa-histidine provides for convenient purificaton of the fusion protein. The HA tag is another peptide useful for purification which corresponds to an epitope derived of influenza hemaglutinin protein, which has been described by Wilson et al., Cell 37:767 (1984), for instance.

Preferably, a protease chimeric or fusion protein of the invention is produced by standard recombinant DNA techniques. For example, DNA fragments coding for the different polypeptide sequences are ligated together in-frame in accordance with conventional techniques, for example by employing blunt-ended or stagger-ended termini for ligation, restriction enzyme digestion to provide for appropriate termini, filling-in of cohesive ends as appropriate, alkaline phosphatase treatment to avoid undesirable joining, and enzymatic ligation. In another embodiment, the fusion gene can be synthesized by conventional techniques including automated DNA synthesizers. Alternatively, PCR amplification of gene fragments can be carried out using anchor primers which give rise to complementary overhangs between two consecutive gene fragments which can subsequently be annealed and reamplified to generate a chimeric gene sequence (see, for example, Current Protocols in Molecular Biology, eds. Ausubel et al. John Wiley & Sons: 1992). Moreover, many expression vectors are commercially available that already encode a fusion moiety (e.g, a GST polypeptide). A protease-encoding nucleic acid can be cloned into such an expression vector such that the fusion moiety is linked in-frame to the protease protein.

Functional Equivalents

The terms “functional equivalents” and “functional variants” are used interchangeably herein. Functional equivalents of a DNA according to the invention are isolated DNA fragments that encode a polypeptide that exhibits a particular function of an A. niger protease as defined herein. A functional equivalent of a polypeptide according to the invention is a polypeptide that exhibits at least one function of an A. niger protease as defined herein.

Functional protein or polypeptide equivalents may contain only conservative substitutions of one or more amino acids of a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171 or substitutions, insertions or deletions of non-essential amino acids. Accordingly, a non-essential amino acid is a residue that can be altered in a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171 without substantially altering the biological function. For example, amino acid residues that are conserved among the protease proteins of the present invention, are predicted to be particularly unamenable to alteration. Furthermore, amino acids conserved among the protease proteins according to the present invention and other proteases are not likely to be amenable to alteration.

The term “conservative substitution” is intended to mean that a substitution in which the amino acid residue is replaced with an amino acid residue having a similar side chain. These families are known in the art and include amino acids with basic side chains (e.g. lysine, arginine and hystidine), acidic side chains (e.g. aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, asparagines, glutamine, serine, threonine, tyrosine, cysteine), non-polar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine tryptophan, histidine).

Functional nucleic acid equivalents may typically contain silent mutations or mutations that do not alter the biological function of encoded polypeptide. Accordingly, the invention provides nucleic acid molecules encoding protease proteins that contain changes in amino acid residues that are not essential for a particular biological activity. Such protease proteins differ in amino acid sequence from a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171 yet retain at least one biological activity. In one embodiment the isolated nucleic acid molecule comprises a nucleotide sequence encoding a protein, wherein the protein comprises a substantially homologous amino acid sequence of at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or more homologous to the amino acid sequence shown in a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171.

For example, guidance concerning how to make phenotypically silent amino acid substitutions is provided in Bowie, J. U. et al., Science 247:1306-1310 (1990) wherein the authors indicate that there are two main approaches for studying the tolerance of an amino acid sequence to change. The first method relies on the process of evolution, in which mutations are either accepted or rejected by natural selection. The second approach uses genetic engineering to introduce amino acid changes at specific positions of a cloned gene and selects or screens to identify sequences that maintain functionality. As the authors state, these studies have revealed that proteins are surprisingly tolerant of amino acid substitutions. The authors further indicate which changes are likely to be permissive at a certain position of the protein. For example, most buried amino acid residues require non-polar side chains, whereas few features of surface side chains are generally conserved. Other such phenotypically silent substitutions are described in Bowie et al, supra, and the references cited therein.

An isolated nucleic acid molecule encoding a protease protein homologous to the protein selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171 can be created by introducing one or more nucleotide substitutions, additions or deletions into the coding nucleotide sequences according to a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114 such that one or more amino acid substitutions, deletions or insertions are introduced into the encoded protein. Such mutations may be introduced by standard techniques, such as site-directed mutagenesis and PCR-mediated mutagenesis.

The term “functional equivalents” also encompasses orthologues of the A. niger protease protein. Orthologues of the A. niger protease protein are proteins that can be isolated from other strains or species and possess a similar or identical biological activity. Such orthologues can readily be identified as comprising an amino acid sequence that is substantially homologous to a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171.

As defined herein, the term “substantially homologous” refers to a first amino acid or nucleotide sequence which contains a sufficient or minimum number of identical or equivalent (e.g., with similar side chain) amino acids or nucleotides to a second amino acid or nucleotide sequence such that the first and the second amino acid or nucleotide sequences have a common domain. For example, amino acid or nucleotide sequences which contain a common domain having about 60%, preferably 65%, more preferably 70%, even more preferably 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity or more are defined herein as sufficiently identical.

Also, nucleic acids encoding other protease family members, which thus have a nucleotide sequence that differs from a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114, are within the scope of the invention. Moreover, nucleic acids encoding protease proteins from different species which thus have a nucleotide sequence which differs from a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114 are within the scope of the invention.

Nucleic acid molecules corresponding to variants (e.g. natural allelic variants) and homologues of the protease DNA of the invention can be isolated based on their homology to the protease nucleic acids disclosed herein using the cDNAs disclosed herein or a suitable fragment thereof, as a hybridisation probe according to standard hybridisation techniques preferably under highly stringent hybridisation conditions.

In addition to naturally occurring allelic variants of the protease sequence, the skilled person will recognise that changes can be introduced by mutation into the nucleotide sequences of a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114 thereby leading to changes in the amino acid sequence of the protease protein without substantially altering the function of the protease protein.

In another aspect of the invention, improved protease proteins are provided. Improved protease proteins are proteins wherein at least one biological activity is improved. Such proteins may be obtained by randomly introducing mutations along all or part of the protease coding sequence, such as by saturation mutagenesis, and the resulting mutants can be expressed recombinantly and screened for biological activity. For instance, the art provides for standard assays for measuring the enzymatic activity of proteases and thus improved proteins may easily be selected.

In a preferred embodiment the protease protein has an amino acid sequence according to a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171. In another embodiment, the protease polypeptide is substantially homologous to the amino acid sequence according to a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171 and retains at least one biological activity of a polypeptide according to a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171, yet differs in amino acid sequence due to natural variation or mutagenesis as described above.

In a further preferred embodiment, the protease protein has an amino acid sequence encoded by an isolated nucleic acid fragment capable of hybridising to a nucleic acid according to a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57 or a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114, preferably under highly stringent hybridisation conditions.

Accordingly, the protease protein is a protein which comprises an amino acid sequence at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or more homologous to the amino acid sequence shown in a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171 and retains at least one functional activity of the polypeptide according to a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171.

Functional equivalents of a protein according to the invention can also be identified e.g. by screening combinatorial libraries of mutants, e.g. truncation mutants, of the protein of the invention for protease activity. In one embodiment, a variegated library of variants is generated by combinatorial mutagenesis at the nucleic acid level. A variegated library of variants can be produced by, for example, enzymatically ligating a mixture of synthetic oligonucleotides into gene sequences such that a degenerate set of potential protein sequences is expressible as individual polypeptides, or alternatively, as a set of larger fusion proteins (e.g., for phage display). There are a variety of methods that can be used to produce libraries of potential variants of the polypeptides of the invention from a degenerate oligonucleotide sequence. Methods for synthesizing degenerate oligonucleotides are known in the art (see, e.g., Narang (1983) Tetrahedron 39:3; Itakura et al. (1984) Annu. Rev. Biochem. 53:323; Itakura et al. (1984) Science 198:1056; Ike et al. (1983) Nucleic Acid Res. 11:477).

In addition, libraries of fragments of the coding sequence of a polypeptide of the invention can be used to generate a variegated population of polypeptides for screening a subsequent selection of variants. For example, a library of coding sequence fragments can be generated by treating a double stranded PCR fragment of the coding sequence of interest with a nuclease under conditions wherein nicking occurs only about once per molecule, denaturing the double stranded DNA, renaturing the DNA to form double stranded DNA which can include sense/antisense pairs from different nicked products, removing single stranded portions from reformed duplexes by treatment with S1 nuclease, and ligating the resulting fragment library into an expression vector. By this method, an expression library can be derived which encodes N-terminal and internal fragments of various sizes of the protein of interest.

Several techniques are known in the art for screening gene products of combinatorial libraries made by point mutations of truncation, and for screening cDNA libraries for gene products having a selected property. The most widely used techniques, which are amenable to high through-put analysis, for screening large gene libraries typically include cloning the gene library into replicable expression vectors, transforming appropriate cells with the resulting library of vectors, and expressing the combinatorial genes under conditions in which detection of a desired activity facilitates isolation of the vector encoding the gene whose product was detected. Recursive ensemble mutagenesis (REM), a technique which enhances the frequency of functional mutants in the libraries, can be used in combination with the screening assays to identify variants of a protein of the invention (Arkin and Yourvan (1992) Proc. Natl. Acad. Sci. USA 89:7811-7815; Delgrave et al. (1993) Protein Engineering 6(3):327-331).

In addition to the protease gene sequence shown in a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57, it will be apparent for the person skilled in the art that DNA sequence polymorphisms that may lead to changes in the amino acid sequence of the protease protein may exist within a given population. Such genetic polymorphisms may exist in cells from different populations or within a population due to natural allelic variation. Allelic variants may also include functional equivalents.

Fragments of a polynucleotide according to the invention may also comprise polynucleotides not encoding functional polypeptides. Such polynucleotides may function as probes or primers for a PCR reaction. Such polynucleotides may also be useful when it is desired to abolish the functional activity of a protease in a particular organism (knock-out mutants).

Nucleic acids according to the invention irrespective of whether they encode functional or non-functional polypeptides, can be used as hybridization probes or polymerase chain reaction (PCR) primers. Uses of the nucleic acid molecules of the present invention that do not encode a polypeptide having a protease activity include, inter alia, (1) isolating the gene encoding the protease protein, or allelic variants thereof from a cDNA library e.g. from other organisms than A. niger; (2) in situ hybridization (e.g. FISH) to metaphase chromosomal spreads to provide precise chromosomal location of the protease gene as described in Verma et al., Human Chromosomes: a Manual of Basic Techniques, Pergamon Press, New York (1988); (3) Northern blot analysis for detecting expression of protease mRNA in specific tissues and/or cells and 4) probes and primers that can be used as a diagnostic tool to analyse the presence of a nucleic acid hybridisable to the protease probe in a given biological (e.g. tissue) sample.

Also encompassed by the invention is a method of obtaining a functional equivalent of a protease gene or cDNA. Such a method entails obtaining a labelled probe that includes an isolated nucleic acid which encodes all or a portion of the sequence according to a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171 or a variant thereof; screening a nucleic acid fragment library with the labelled probe under conditions that allow hybridisation of the probe to nucleic acid fragments in the library, thereby forming nucleic acid duplexes, and preparing a full-length gene sequence from the nucleic acid fragments in any labelled duplex to obtain a gene related to the protease gene.

In one embodiment, a protease nucleic acid of the invention is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more homologous to a nucleic acid sequence shown in a sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 57, a sequence selected from the group consisting of SEQ ID NO: 58 to SEQ ID NO: 114 or the complement thereof.

In another preferred embodiment a protease polypeptide of the invention is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more homologous to the amino acid sequence shown in a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171.

Host Cells

In another embodiment, the invention features cells, e.g., transformed host cells or recombinant host cells that contain a nucleic acid encompassed by the invention. A “transformed cell” or “recombinant cell” is a cell into which (or into an ancestor of which) has been introduced, by means of recombinant DNA techniques, a nucleic acid according to the invention. Both prokaryotic and eukaryotic cells are included, e.g., bacteria, fungi, yeast, and the like, especially preferred are cells from filamentous fungi, in particular Aspergillus niger.

A host cell can be chosen that modulates the expression of the inserted sequences, or modifies and processes the gene product in a specific, desired fashion. Such modifications (e.g., glycosylation) and processing (e.g., cleavage) of protein products may facilitate optimal functioning of the protein.

Various host cells have characteristic and specific mechanisms for post-translational processing and modification of proteins and gene products. Appropriate cell lines or host systems familiar to those of skill in the art of molecular biology and/or microbiology can be chosen to ensure the desired and correct modification and processing of the foreign protein expressed. To this end, eukaryotic host cells that possess the cellular machinery for proper processing of the primary transcript, glycosylation, and phosphorylation of the gene product can be used. Such host cells are well known in the art.

Host cells also include, but are not limited to, mammalian cell lines such as CHO, VERO, BHK, HeLa, COS, MDCK, 293, 3T3, W138, and choroid plexus cell lines.

If desired, the polypeptides according to the invention can be produced by a stably-transfected cell line. A number of vectors suitable for stable transfection of mammalian cells are available to the public, methods for constructing such cell lines are also publicly known, e.g., in Ausubel et al. (supra).

Antibodies

The invention further features antibodies, such as monoclonal or polyclonal antibodies, that specifically bind protease proteins according to the invention.

As used herein, the term “antibody” (Ab) or “monoclonal antibody” (Mab) is meant to include intact molecules as well as antibody fragments (such as, for example, Fab and F(ab′)₂ fragments) which are capable of specifically binding to protease protein. Fab and F(ab′)₂ fragments lack the Fc fragment of intact antibody, clear more rapidly from the circulation, and may have less non-specific tissue binding of an intact antibody (Wahl et al., J. Nucl. Med. 24:316-325 (1983)). Thus, these fragments are preferred.

The antibodies of the present invention may be prepared by any of a variety of methods. For example, cells expressing the protease protein or an antigenic fragment thereof can be administered to an animal in order to induce the production of sera containing polyclonal antibodies. In a preferred method, a preparation of protease protein is prepared and purified to render it substantially free of natural contaminants. Such a preparation is then introduced into an animal in order to produce polyclonal antisera of greater specific activity.

In the most preferred method, the antibodies of the present invention are monoclonal antibodies (or protease protein binding fragments thereof). Such monoclonal antibodies can be prepared using hybridoma technology (Kohler et al., Nature 256:495 (1975); Kohler et al., Eur. J. Immunol. 6:511 (1976); Hammerling et al., In: Monoclonal Antibodies and T-Cell Hybridomas, Elsevier, N.Y., pp. 563-681 (1981)). In general, such procedures involve immunizing an animal (preferably a mouse) with a protease protein antigen or, with a protease protein expressing cell. The splenocytes of such mice are extracted and fused with a suitable myeloma cell line. Any suitable myeloma cell line may be employed in accordance with the present inventoin; however, it is preferably to employ the parent myeloma cell line (SP₂O), available from the American Type Culture Collection, Rockville, Md. After fusion, the resulting hybridoma cells are selectively maintained in HAT medium, and then cloned by limiting dilution as described by Wands et al. (Gastro-enterology 8.225-232 (1981)). The hybridoma cells obtained through such a selection are then assayed to identify clones which secrete antibodies capable of binding the protease protein antigen. In general, the polypeptides can be coupled to a carrier protein, such as KLH, as described in Ausubel et al., supra, mixed with an adjuvant, and injected into a host mammal.

In particular, various host animals can be immunized by injection of a polypeptide of interest. Examples of suitable host animals include rabbits, mice, guinea pigs, and rats. Various adjuvants can be used to increase the immunological response, depending on the host species, including but not limited to Freund's (complete and incomplete), adjuvant mineral gels such as aluminum hydroxide, surface actve substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, dinitrophenol, BCG (bacille Calmette-Guerin) and Corynebacterium parvum. Polyclonal antibodies are heterogeneous populations of antibody molecules derived from the sera of the immunized animals.

Such antibodies can be of any immunoglobulin class including IgG, IgM, IgE, IgA, IgD, and any subclass thereof. The hybridomas producing the mAbs of this invention can be cultivated in vitro or in vivo.

Once produced, polyclonal or monoclonal antibodies are tested for specific recognition of an protease polypeptide or functional equivalent thereof in an immunoassay, such as a Western blot or immunoprecipitation analysis using standard techniques, e.g., as described in Ausubel et al., supra. Antibodies that specifically bind to protease proteins or functional equivalents thereof are useful in the invention. For example, such antibodies can be used in an immunoassay to detect protease in pathogenic or non-pathogenic strains of Aspergillus (e.g., in Aspergillus extracts).

Preferably, antibodies of the invention are produced using fragments of the protease polypeptides that appear likely to be antigenic, by criteria such as high frequency of charged residues. For example, such fragments may be generated by standard techniques of PCR, and then cloned into the pGEX expression vector (Ausubel et al., supra). Fusion proteins may then be expressed in E. coli and purified using a glutathione agarose affinity matrix as described in Ausubel, et al., supra. If desired, several (e.g., two or three) fusions can be generated for each protein, and each fusion can be injected into at least two rabbits. Antisera can be raised by injections in a series, typically including at least three booster injections. Typically, the antisera are checked for their ability to immunoprecipitate a recombinant protease polypeptide or functional equivalents thereof whereas unrelated proteins may serve as a control for the specificity of the immune reaction.

Alternatively, techniques decribed for the production of single chain antibodies (U.S. Pat. Nos. 4,946,778 and 4,704,692) can be adapted to produce single chain antibodies against a protease polypeptide or functional equivalents thereof. Kits for generating and screening phage display libraries are commercially available e.g. from Pharmacia.

Additionally, examples of methods and reagents particularly amenable for use in generating and screening antibody display library can be found in, for example, U.S. Pat. No. 5,223,409; PCT Publication No. WO 92/18619; PCT Publication No. WO 91/17271; PCT Publication No. WO 20791; PCT Publication No. WO 92/20791; PCT Publication No. WO 92/15679; PCT Publication No. WO 93/01288; PCT Publication No. WO 92/01047; PCT Publication No. WO 92/09690; PCT Publication No. WO 90/02809; Fuchs et al. (1991) Bio/Technology 9:1370-1372; Hay et al. (1992) Hum. Antibod. Hybridomas 3:81-85; Huse et al. (1989) Science 246;1275-1281; Griffiths et al. (1993) EMBO J. 12:725-734.

Polyclonal and monoclonal antibodies that specifically bind protease polypeptides of functional equivalents thereof can be used, for example, to detect expression of a protease gene or a functional equivalent thereof e.g. in another strain of Aspergillus.

For example, protease polypeptide can be readily detected in conventional immunoassays of Aspergillus cells or extracts. Examples of suitable assays include, without limitation, Western blotting, ELISAs, radioimmune assays, and the like.

By “specifically binds” is meant that an antibody recognizes and binds a particular antigen, e.g., a protease polypeptide, but does not substantially recognize and bind other unrelated molecules in a sample.

Antibodies can be purified, for example, by affinity chromatography methods in which the polypeptide antigen is immobilized on a resin.

An antibody directed against a polypeptide of the invention (e.g., monoclonal antibody) can be used to isolate the polypeptide by standard techniques, such as affinity chromatography or immunoprecipitation. Moreover, such an antibody can be used to detect the protein (e.g., in a cellular lysate or cell supernatant) in order to evaluate the abundance and pattern of expression of the polypeptide. The antibodies can also be used diagnostically to monitor protein levels in cells or tissue as part of a clinical testing procedure, e.g., to, for example, determine the efficacy of a given treatment regimen or in the diagnosis of Aspergillosis.

Detection can be facilitated by coupling the antibody to a detectable substance. Examples of detectable substances include various enzymes, prosthetic groups, fluorescent materials, luminescent materials, bioluminescent materials, and radioactive materials. Examples of suitable enzymes include horseradish peroxidase, alkaline phosphatase, β-galactosidase, or acetylcholinesterase; examples of suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an example of a luminescent material includes luminol; examples of bioluminescent materials include luciferase, luciferin, and aequorin, and examples of suitable radioactive materials include ¹²⁵I, ¹³¹I, ³⁵S or ³H.

Preferred epitopes encompassed by the antigenic peptide are regions that are located on the surface of the protein, e.g., hydrophilic regions. Hydrophobicity plots of the proteins of the invention can be used to identify hydrophilic regions.

The antigenic peptide of a protein of the invention comprises at least 7 (preferably 10, 15, 20, or 30) contiguous amino acid residues of the amino acid sequense of a sequence selected from the group consisting of SEQ ID NO: 115 to SEQ ID NO: 171 and encompasses an epitope of the protein such that an antibody raised against the peptide forms a specific immune complex with the protein.

Preferred epitopes encompassed by the antigenic peptide are regions of protease that are located on the surface of the protein, e.g., hydrophilic regions, hydrophobic regions, alpha regions, beta regions, coil regions, turn regions and flexible regions.

Immunoassays

Qualitative or quantitative determination of a polypeptide according to the present invention in a biological sample can occur using any art-known method. Antibody-based techniques provide special advantages for assaying specific polypeptide levels in a biological sample.

In these, the specific recognition is provided by the primary antibody (polyclonal or monoclonal) but the secondary detection system can utilize fluorescent, enzyme, or other conjugated secondary antibodies. As a result, an immunocomplex is obtained.

Accordingly, the invention provides a method for diagnosing whether a certain organism is infected with Aspergillus comprising the steps of:

-   -   Isolating a biological sample from said organism suspected to be         infected with Aspergillus,     -   reacting said biological sample with an antibody according to         the invention,     -   determining whether immune complexes are formed.

Tissues can also be extracted, e.g., with urea and neutral detergent, for the liberation of protein for Western-blot or dot/slot assay. This technique can also be applied to body fluids.

Other antibody-based methods useful for detecting protease gene expression include immunoassays, such as the enzyme linked immunosorbent assay (ELISA) and the radioimmunoassay (RIA). For example, protease-specific monoclonal antibodies can be used both as an immunoabsorbent and as an enzyme-labeled probe to detect and quantify the protease protein. The amount of protease protein present in the sample can be calculated by reference to the amount present in a standard preparation using a linear regression computer algorithm. In another ELISA assay, two distinct specific monoclonal antibodies can be used to detect protease protein in a biological fluid. In this assay, one of the antibodies is used as the immuno-absorbent and the other as the enzyme-labeled probe.

The above techniques may be conducted essentially as a “one-step” or “two-step” assay. The “one-step” assay involves contacting protease protein with immobilized antibody and, without washing, contacting the mixture with the labeled antibody. The “two-step” assay involves washing before contacting the mixture with the labeled antibody. Other conventional methods may also be employed as suitable. It is usually desirable to immobilize one component of the assay system on a support, thereby allowing other components of the system to be brought into contact with the component and readily removed from the sample.

Suitable enzyme labels include, for example, those from the oxidase group, which catalyze the production of hydrogen peroxide by reacting with substrate. Activity of an oxidase label may be assayed by measuring the concentration of hydrogen peroxide formed by the enzyme-labelled antibody/substrate reaction.

Besides enzymes, other suitable labels include radioisotopes, such as iodine (¹²⁵I, ¹²¹I), carbon (¹⁴C), sulphur (³⁵S), tritium (³H), indium (¹¹²In), and technetium (^(99m)Tc), and fluorescent labels, such as fluorescein and rhodamine, and biotin.

Specific binding of a test compound to a protease polypeptide can be detected, for example, in vitro by reversibly or irreversibly immobilizing the protease polypeptide on a substrate, e.g., the surface of a well of a 96-well polystyrene microtitre plate. Methods for immobilizing polypeptides and other small molecules are well known in the art. For example, the microtitre plates can be coated with a protease polypeptide by adding the polypeptide in a solution (typically, at a concentration of 0.05 to 1 mg/ml in a volume of 1-100 ul) to each well, and incubating the plates at room temperature to 37° C. for 0.1 to 36 hours. Polypeptides that are not bound to the plate can be removed by shaking the excess solution from the plate, and then washing the plate (once or repeatedly) with water or a buffer. Typically, the polypeptide is contained in water or a buffer. The plate is then washed with a buffer that lacks the bound polypeptide. To block the free protein-binding sites on the plates, the plates are blocked with a protein that is unrelated to the bound polypeptide. For example, 300 ul of bovine serum albumin (BSA) at a concentration of 2 mg/ml in Tris-HCl is suitable. Suitable substrates include those substrates that contain a defined cross-linking chemistry (e.g., plastic substrates, such as polystyrene, styrene, or polypropylene substrates from Corning Costar Corp. (Cambridge, Mass.), for example). If desired, a beaded particle, e.g., beaded agarose or beaded sepharose, can be used as the substrate.

Binding of the test compound to the polypeptides according to the invention can be detected by any of a variety of art known methods. For example, a specific antibody can be used in an immunoassay. If desired, the antibody can be labeled (e.g., fluorescently or with a radioisotope) and detected directly (see, e.g., West and McMahon, J. Cell Biol. 74:264, 1977). Alternatively, a second antibody can be used for detection (e.g., a labeled antibody that binds the Fc portion of an anti-AN97 antibody). In an alternative detection method, the protease polypeptide is labeled, and the label is detected (e.g., by labeling a protease polypeptide with a radioisotope, fluorophore, chromophore, or the like). In still another method, the protease polypeptide is produced as a fusion protein with a protein that can be detected optically, e.g., green fluorescent protein (which can be detected under UV light). In an alternative method, the protease polypeptide can be covalently attached to or fused with an enzyme having a detectable enzymatic activity, such as horse radish peroxidase, alkaline phosphatase, a-galactosidase, or glucose oxidase. Genes encoding all of these enzymes have been cloned and are readily available for use by those of skill in the art. If desired, the fusion protein can include an antigen, and such an antigen can be detected and measured with a polyclonal or monoclonal antibody using conventional methods. Suitable antigens include enzymes (e.g., horse radish peroxidase, alkaline phosphatase, and a-galactosidase) and non-enzymatic polypeptides (e.g., serum proteins, such as BSA and globulins, and milk proteins, such as caseins).

Epitopes, Antigens and Immunogens.

In another aspect, the invention provides a peptide or polypeptide comprising an epitope-bearing portion of a polypeptide of the invention. The epitope of this polypeptide portion is an immunogenic or antigenic epitope of a polypeptide of the invention. An “immunogenic epitope” is defined as a part of a protein that elicits an antibody response when the whole protein is the immunogen. These immunogenic epitopes are believed to be confined to a few loci on the molecule. On the other hand, a region of a protein molecule to which an antibody can bind is defined as an “antigenic epitope.” The number of immunogenic epitopes of a protein generally is less than the number of antigenic epitopes. See, for instance, Geysen, H. M. et al., Proc. Natl. Acad. Sci. USA 81:3998-4002 (1984).

As to the selection of peptides or polypeptides bearing an antigenic epitope (i.e., that contain a region of a protein molecule to which an antibody can bind), it is well known in that art that relatively short synthetic peptides that mimic part of a protein sequence are routinely capable of eliciting an antiserum that reacts with the partially mimicked protein. See, for instance, Sutcliffe, J. G. et al., Science 219:660-666 (1984). Peptides capable of eliciting protein-reactive sera are frequently represented in the primary sequence of a protein, can be characterized by a set of simple chemical rules, and are confined neither to immunodominant regions of intact proteins (i.e., immunogenic epitopes) nor to the amino or carboxyl terminals. Peptides that are extremely hydrophobic and those of six or fewer residues generally are ineffective at inducing antibodies that bind to the mimicked protein; longer, soluble peptides, especially those containing proline residues, usually are effective. Sutcliffe et al., supra ?. For instance, 18 of 20 peptides designed according to these guidelines, containing 8-39 residues covering 75% of the sequence of the influenza virus hemagglutinin HAI polypeptide chain, induced antibodies that reacted with the HA1 protein or intact virus; and 12/12 peptides from the MuLV polymerase and 18/18 from the rabies glycoprotein induced antibodies that precipitated the respective proteins.

Antigenic epitope-bearing peptides and polypeptides of the invention are therefore useful to raise antibodies, including monoclonal antibodies, that bind specifically to a polypeptide of the invention. Thus, a high proportion of hybridomas obtained by fusion of spleen cells from donors immunized with an antigen epitope-bearing peptide generally secrete antibody reactive with the native protein. Sutcliffe et al., supra, at 663. The antibodies raised by antigenic epitope bearing peptides or polypeptides are useful to detect the mimicked protein, and antibodies to different peptides may be used for tracking the fate of various regions of a protein precursor which undergoes posttranslation processing. The peptides and anti-peptide antibodies may be used in a variety of qualitative or quantitative assays for the mimicked protein, for instance in competition assays since it has been shown that even short peptides (e.g., about 9 amino acids) can bind and displace the larger peptides in immunoprecipitation assays. See, for instance, Wilson, I. A. et al., Cell 37:767-778 at 777 (1984). The anti-peptide antibodies of the invention also are useful for purification of the mimicked protein, for instance, by adsorption chromatography using methods well known in the art.

Antigenic epitope-bearing peptides and polypeptides of the invention designed according to the above guidelines preferably contain a sequence of at least seven, more preferably at least nine and most preferably between about 15 to about 30 amino acids contained within the amino acid sequence of a polypeptide of the invention. However, peptides or polypeptides comprising a larger portion of an amino acid sequence of a polypeptide of the invention, containing about 30 to about 50 amino acids, or any length up to and including the entire amino acid sequence of a polypeptide of the invention, also are considered epitope-bearing peptides or polypeptides of the invention and also are useful for inducing antibodies that react with the mimicked protein. Preferably, the amino acid sequence of the epitope-bearing peptide is selected to provide substantial solubility in aqueous solvents (i.e., the sequence includes relatively hydrophilic residues and highly hydrophobic sequences are preferably avoided); and sequences containing proline residues are particularly preferred.

The epitope-bearing peptides and polypeptides of the invention may be produced by any conventional means for making peptides or polypeptides including recombinant means using nucleic acid molecules of the invention. For instance, a short epitope-bearing amino acid sequence may be fused to a larger polypeptide which acts as a carrier during recombinant production and purification, as well as during immunization to produce anti-peptide antibodies.

Epitope-bearing peptides also may be synthesized using known methods of chemical synthesis. For instance, Houghten has described a simple method for synthesis of large numbers of peptides, such as 10-20 mg of 248 different 13 residue peptides representing single amino acid variants of a segment of the HAI polypeptide which were prepared and characterized (by ELISA-type binding studies) in less than four weeks. Houghten, R. A., Proc. Natl. Acad. Sci. USA 82:5131-5135 (1985). This “Simultaneous Multiple Peptide Synthesis (SMPS)” process is further described in U.S. Pat. No. 4,631,211 to Houghten et al. (1986). In this procedure the individual resins for the solid-phase synthesis of various peptides are contained in separate solvent-permeable packets, enabling the optimal use of the many identical repetitive steps involved in solid-phase methods.

A manual procedure allows 500-1000 or more syntheses to be conducted simultaneously. Houghten et al., supra, at 5134.

Epitope-bearing peptides and polypeptides of the invention are used to induce antibodies according to methods well known in the art. See, for instance, Sutcliffe et al., supra; Wilson et al., supra; Chow, M. et al., Proc. Natl. Acad. Sci. USA 82:910-914; and Bittle, F. J. et al., J. Gen. Virol. 66:2347-2354 (1985).

Generally, animals may be immunized with free peptide; however, anti-peptide antibody titer may be boosted by coupling of the peptide to a macromolecular carrier, such as keyhole limpet hemocyanin (KLH) or tetanus toxoid. For instance, peptides containing cysteine may be coupled to carrier using a linker such as maleimidobenzoyl-N-hydroxysuccinimide ester (MBS), while other peptides may be coupled to carrier using a more general linking agent such as glutaraldehyde.

Animals such as rabbits, rats and mice are immunized with either free or carrier coupled peptides, for instance, by intraperitoneal and/or intradermal injection of emulsions containing about 100 ug peptide or carrier protein and Freund's adjuvant. Several booster injections may be needed, for instance, at intervals of about two weeks, to provide a useful titer of anti-peptide antibody which can be detected, for example, by ELISA assay using free peptide adsorbed to a solid surface. The titer of anti-peptide antibodies in serum from an immunized animal may be increased by selection of anti-peptide antibodies, for instance, by adsorption to the peptide on a solid support and elution of the selected antibodies according to methods well known in the art.

Immunogenic epitope-bearing peptides of the invention, i.e., those parts of a protein that elicit an antibody response when the whole protein is the immunogen, are identified according to methods known in the art. For instance, Geysen et al., 1984, supra, discloses a procedure for rapid concurrent synthesis on solid supports of hundreds of peptides of sufficient purity to react in an enzyme-linked immunosorbent assay. Interaction of synthesized peptides with antibodies is then easily detected without removing them from the support. In this manner a peptide bearing an immunogenic epitope of a desired protein may be identified routinely by one of ordinary skill in the art. For instance, the immunologically important epitope in the coat protein of foot-and-mouth disease virus was located by Geysen et al. with a resolution of seven amino acids by synthesis of an overlapping set of all 208 possible hexapeptides covering the entire 213 amino acid sequence of the protein. Then, a complete replacement set of peptides in which all 20 amino acids were substituted in turn at every position within the epitope were synthesized, and the particular amino acids conferring specificity for the reaction with antibody were determined. Thus, peptide analogs of the epitope-bearing peptides of the invention can be made routinely by this method. U.S. Pat. No. 4,708,781 to Geysen (1987) further describes this method of identifying a peptide bearing an immunogenic epitope of a desired protein.

Further still, U.S. Pat. No. 5,194,392 to Geysen (1990) describes a general method of detecting or determining the sequence of monomers (amino acids or other compounds) which is a topological equivalent of the epitope (i.e., a “mimotope”) which is complementary to a particular paratope (antigen binding site) of an antibody of interest. More generally, U.S. Pat. No. 4,433,092 to Geysen (1989) describes a method of detecting or determining a sequence of monomers which is a topographical equivalent of a ligand which is complementary to the ligand binding site of a particular receptor of interest. Similarly, U.S. Pat. No. 5,480,971 to Houghten, R. A. et al. (1996) on Peralkylated Oligopeptide Mixtures discloses linear C1-C7-alkyl peralkylated oligopeptides and sets and libraries of such peptides, as well as methods for using such oligopeptide sets and libraries for determining the sequence of a peralkylated oligopeptide that preferentially binds to an acceptor molecule of interest. Thus, non-peptide analogs of the epitope-bearing peptides of the invention also can be made routinely by these methods.

Removal or Reduction of Protease Activity

The present invention also relates to methods for producing a mutant cell of a parent cell, which comprises disrupting or deleting a nucleic acid sequence encoding the protease or a control sequence thereof, which results in the mutant cell producing less of the protease than the parent cell.

The construction of strains which have reduced protease activity may be conveniently accomplished by modification or inactivation of a nucleic acid sequence necessary for expression of the protease activity in the cell. The nucleic acid sequence to be modified or inactivated may be, for example, a nucleic acid sequence encoding the protease or a part thereof essential for exhibiting protease activity, or the nucleic acid sequence may have a regulatory function required for the expression of the protease from the coding sequence of the nucleic acid sequence. An example of such a regulatory or control sequence may be a promoter sequence or a functional part thereof, i.e., a part which is sufficient for affecting expression of the protease. Other control sequences for possible modification include, but are not limited to, a leader, a polyadenylation sequence, a propeptide sequence, a signal sequence, and a termination site.

Modification or inactivation of the nucleic acid sequence may be performed by subjecting the cell to mutagenesis and selecting for cells in which the protease producing capability has been reduced or eliminated. The mutagenesis, which may be specific or random, may be performed, for example, by use of a suitable physical or chemical mutagenizing agent, by use of a suitable oligonucleotide, or by subjecting the DNA sequence to PCR generated mutagenesis. Furthermore, the mutagenesis may be performed by use of any combination of these mutagenizing agents.

Examples of a physical or chemical mutagenizing agent suitable for the present purpose include ultraviolet (UV) irradiation, hydroxylamine, N-methyl-N′-nitro-N-nitrosoguanidine (MNNG), O-methyl hydroxylamine, nitrous acid, ethyl methane sulphonate (EMS), sodium bisulphite, formic acid, and nucleotide analogues. When such agents are used, the mutagenesis is typically performed by incubating the cell to be mutagenized in the presence of the mutagenizing agent of choice under suitable conditions, and selecting for cells exhibiting reduced or no expression of protease activity.

Modification or inactivation of production of a protease of the present invention may be accomplished by introduction, substitution, or removal of one or more nucleotides in the nucleic acid sequence encoding the protease or a regulatory element required for the transcription or translation thereof. For example, nucleotides may be inserted or removed so as to result in the introduction of a stop codon, the removal of the start codon, or a change of the open reading frame. Such modification or inactivation may be accomplished by site-directed mutagenesis or PCR generated mutagenesis in accordance with methods known in the art.

Although, in principle, the modification may be performed in vivo, i.e., directly on the cell expressing the nucleic acid sequence to be modified, it is preferred that the modification be performed in vitro as exemplified below.

An example of a convenient way to inactivate or reduce production by a host cell of choice is based on techniques of gene replacement or gene interruption. For example, in the gene interruption method, a nucleic acid sequence corresponding to the endogenous gene or gene fragment of interest is mutagenized in vitro to produce a defective nucleic acid sequence which is then transformed into the host cell to produce a defective gene. By homologous recombination, the defective nucleic acid sequence replaces the endogenous gene or gene fragment. It may be desirable that the defective gene or gene fragment also encodes a marker which may be used for selection of transformants in which the gene encoding the protease has been modified or destroyed.

Alternatively, modification or inactivation of the nucleic acid sequence encoding a protease of the present invention may be performed by established anti-sense techniques using a nucleotide sequence complementary to the protease encoding sequence. More specifically, production of the protease by a cell may be reduced or eliminated by introducing a nucleotide sequence complementary to the nucleic acid sequence encoding the protease which may be transcribed in the cell and is capable of hybridizing to the protease mRNA produced in the cell. Under conditions allowing the complementary antisense nucleotide sequence to hybridize to the protease mRNA, the amount of protease translated is thus reduced or eliminated.

It is preferred that the cell to be modified in accordance with the methods of the present invention is of microbial origin, for example, a fungal strain which is suitable for the production of desired protein products, either homologous or heterologous to the cell.

The present invention further relates to a mutant cell of a parent cell which comprises a disruption or deletion of a nucleic acid sequence encoding the protease or a control sequence thereof, which results in the mutant cell producing less of the protease than the parent cell.

The protease-deficient mutant cells so created are particularly useful as host cells for the expression of homologous and/or heterologous polypeptides. Therefore, the present invention further relates to methods for producing a homologous or heterologous polypeptide comprising (a) culturing the mutant cell under conditions conducive for production of the polypeptide; and (b) recovering the polypeptide. In the present context, the term “heterologous polypeptides” is defined herein as polypeptides which are not native to the host cell, a native protein in which modifications have been made to alter the native sequence, or a native protein whose expression is quantitatively altered as a result of a manipulation of the host cell by recombinant DNA techniques.

The methods of the present invention for producing an essentially protease-free product is of particular interest in the production of eukaryotic polypeptides, in particular fungal proteins such as enzymes. The protease-deficient cells may also be used to express heterologous proteins of interest for the food industry, or of pharmaceutical interest.

Use of Proteases in Industrial Processes

The invention also relates to the use of the protease according to the invention in a selected number of industrial and pharmaceutical processes. Despite the long term experience obtained with these processes, the protease according to the invention features a number of significant advantages over the enzymes currently used. Depending on the specific application, these advantages can include aspects like lower production costs, higher specificity towards the substrate, less antigenic, less undesirable side activities, higher yields when produced in a suitable microorganism, more suitable pH and temperature ranges, better tastes of the final product as well as food grade and kosher aspects.

In large scale industrial applications aimed at food or feed production, proteolytic enzymes are commonly used to improve aspects like protein solubility, extraction yields, viscosity or taste, texture, nutritional value, minimalisation of antigenicity or antinutrional factors, colour or functionality as well as processing aspects like filterablity of the proteinaceous raw material. In these applications the proteinaceous raw material can be of animal or vegetable origin and examples include vegetable proteins such as soy protein, wheat gluten, rape seed protein, pea protein, alfalfa protein, sunflower protein, fabaceous bean protein, cotton or sesame seed protein, maize protein, barley protein, sorghum protein, potato protein, rice protein, coffee proteins, and animal derived protein such as milk protein (e.g. casein, whey protein), egg white, fish protein, meat protein including gelatin, collagen, blood protein (e.g. haemoglobin), hair, feathers and fish meal.

An important aspect of the proteases according to the invention is that they cover a whole range of pH and temperature optima which are ideally suited for a variety of applications. For example many large scale processes benefit from relatively high processing temperatures of 50 degrees C. or higher to control the risks of microbial infections. Several proteases according to the invention comply with this demand but at the same time exhibit no extreme heat stabilities so that they resist attempts to inactivate the enzyme by an additional heat treatment. The latter feature allows production routes that yield final products free of residual proteolytic activity. Similarly many feed and food products have slightly acidic pH values so that for their processing proteases with acidic or near neutral pH optima are preferred. A protease according to the invention complies with this requirement as well.

The specificity of endoproteases is usually defined in terms of preferential cleavages of bonds between the carboxyl of the amino acid residue in position P1 and the amino group of the residue in position P1′ respectively. The preference may be conditioned predominantly either by P1 (e.g. positively charged residues in substrates for trypsin), by P1′(e.g. hydrophobic residues in cleavages by thermolysin) or by both P1 and P2 (e.g. specific cleavages between two positively charged residues by adrenal medulla serine endoprotease). In some cases more distant residues may determine the cleavage preference, e.g. P2 for streptococcal peptidase A. Some residues are known to influence cleavages negatively; it is well known that bonds with proline in position P1′ are resistant to the action of many proteases. Most endoproteases cleave preferentially either in a hydrophobic environment or in the proximity of negatively charged residues. For example, industrially available endoproteases like chymotrypsin (obtained from bovine pancreas) or subtilisin, neutral metallo endoprotease or thermolysin (all obtained from Bacillus species) tend to favour cleavage “behind” hydrophobic amino acids like -Phe, -Leu and -Tyr. Other industrially available endoproteases are trypsin (obtained from bovine pancreas) preferring cleavage behind -Arg and -Lys and papain (a complex mixture of various enzymes including proteases obtained from papaya fruits) preferring cleavage behind -Arg.

In contrast, peptide bonds formed by small sized residues such as Ala, Gly, Ser, Thre as well as IIe and Pro are poor substrates (Keil, B et al.; Protein Seq Data Anal (1993) 5; 401-407). This situation has a profound implications for the pharmaceutical, the food and beverages, the agro and even the chemical industry. A protease according to the invention exhibits uncommon cleavage preferences.

The exopeptidases act only near the ends of polypeptide chains. Those acting at a free N-terminus liberate a single amino acid residue (socalled aminopeptidases) or a dipeptide or a tripeptide (socalled dipeptidyl-peptidases and tripeptidyl-peptidases) Those acting at a free C-terminus liberate a single residue (socalled carboxypeptidases) or a dipeptide (socalled peptidyl-dipeptidases) The carboxypeptidases are allocated to three groups on the basis of catalytic mechanism i.e. serine-type carboxypeptidases, metallocarboxypeptidases and cystein-type carboxypeptidases. Other exopeptidases are specific for dipeptides (socalled dipeptidases) or are able to cleave peptide linkages other than those of alpha-carboxyl or alpha-amino groups (socalled omega peptidases). Examples of such new omega peptidases are the pyroglutamyl-peptidase and the acylaminoacyl-peptidase as identified in the present invention (see Table 1, genes 18 and 45 respectively).

Typical examples of industrial application which depend on the use of pure endoproteases and in which the protease according to the invention can be expected to deliver a superior performance include the processing of materials of vegetable or animal origin. These processing steps can be-aimed at modifying a large array of characteristics of either the crude material or the (partially) purified protein fraction. For example, these processing steps can be aimed at maximising product solubilities, filterabilities, separabilities, protein extraction yields and digestibilities or minimising toxicities, off-tastes and viscosities. Furthermore the treatment can be directed at altering physico-chemical characteristics of the crude material or the purified (or partially purified) protein. These advantages apply not only if the endoprotease according to the invention is applied as a processing aid in industrial applications but also if applied as an active enzyme component in animal feed. Specifically the endoprotease according to the invention can be applied as bread improver in the bakery industry, e.g. to retard the staling of bread or to diminishing the viscosity of doughs. Or the endoprotease can be used in the beer and wine industry to prevent or to minimise the formation of undesirable protein hazes. Alternatively it can be used in the beer industry to optimise the protein extraction yields of cereals used in the preparation of the wort. Furthermore, it can also be advantageously used in the dairy industry as a milk clotting agent with superior characteristics or to optimise the texturising, foaming or setting characteristics of various milk components. Another application in the dairy industry is the use of the new protease in the preparation of Enzyme Modified Cheeses (EMC's).

Moreover, various proteinaceous substrates can be subjected to an endoprotease according to the invention, usually in combination with other proteolytic enzymes to obtain hydrolysates for medical or non-medical applications. Here the endoprotease according to the invention is surprisingly effective in achieving a complete hydrolysis of the proteinaceous substrate so that even protease resistant parts are fully hydrolysed, the endoprotease is also surprisingly active in minimising the allergenicity of the final hydrolysate or in suppressing the formation of bitter off-tastes.

More specifically the endoprotease according to the invention is characterised by its preference for cleaving proteins at unusual peptide bonds, especially with the small size amino acid residues of Ala, Gly, Ser and Thr, or the residues lie and Pro in either the P1 or the P1′ position (Keil, B et al.; Protein Seq Data Anal (1993) 5; 401-407). As the result those fractions of the proteinaceous starting materials that resist hydrolysis upon using prior art endoproteases, can be dissolved and hydrolysed using the endoprotease according to the invention. Non limiting examples of such protease resistant fractions include socalled extensins in plant materials and collagen, gelatin but also specific milk components in material of animal origen.

Various feedstuffs such as e.g. soybeans contain trypsin inhibitors. These proteins inhibit trypsin activity in the GI-tract of e.g. pigs and poultry. This trypsin inhibiting activity results in sub-optimal protein digestibility in these animals resulting in increased waste production and poor economics. This problem may partly be overcome by toasting soybeans at high temperatures. Two different types of trypsin inhibitors have been identified in soybeans, i.e. the Bowman-Birk type trypsin inhibitors and the Kunitz type trypsin inhibitors.

This invention now provides an alternative way to degrade trypsin inhibiting activity over toasting, in that it provides a cysteine proteases (EC 3.4.22, table 1) capable of cleaving at Leucine176-Aspartate177 peptide bond near the carboxyl-terminus of the Kunitz type trypsin inhibitor (as reviewed by Wilson (1988) in CRC Critical Reviews in Biotechnology 8 (3): 197-216). This results in inactivation of this trypsin inhibitor in soybean. It was surprisingly found that the cysteine proteases secreted by the fungus Aspergillus niger fulfilled these criteria far better than similar enzymes derived from other organisms.

Proteases are also widely used in the art of cheese-making. In the production of cheese it is necessary to coagulate the cheese milk to be able to separate the cheese matters e.g. casein from the whey. Several milk coagulating enzymes, also referred to as coagulants, have been described and include (bovine) chymosin, bovine pepsin, porcine pepsin as well as microbial enzymes like Rhizomucor miehei protease, Rhizomucor pusillus protease and Cryptonectria parasitica protease. Chymosin can be obtained from calf stomachs but can also be produced microbially by for example Kluyveromyces lactis. All these enzymes are characterized by having specificity for the peptide bond between residue 105 (phenylalanine) and residue 106 (methionine) or the bond adjacent to that in K-casein. This means that by employing these enzymes in cheese making, the K-casein is split at the junction between para-K-casein and the macro-peptide moiety called glycomacropeptide (GMP) carrying the negative charges. When this occurs the macropeptide diffuses into the whey, its stabilizing effect on the solubility of the casein micelles is lost, and the casein micelles can start to aggregate once sufficient kappa-casein has been hydrolyzed. For further elaboration on the enzymatic coagulation of milk (e.g. D. G. Dalgleish in Advanced Dairy Chemistry vol. 1 ed by P. F. Fox, Elsevier, London, 1992.

The currently available coagulants allow for a rather high yield of cheese, however, it should be realised that due to the enormous volumes of cheese produced, an increased yield in the order of magnitude of tenths of percent points may constitute a substantial economical advantage. Consequently there is a great need in the art for coagulants with an (even slightly) improved yield.

Coagulants are characterized by their high substrate specificity, which is, however, dependent on pH and temperature. In a typical cheese making process the pH will change from the initial pH 6.3 to lower pH values in the range of 4.5-5.5, the end-value depends on the conditions used during the cheese production process. Some coagulants are more sensitive to pH changes than others. The Rhizomucor pusillus protease for example is more sensitive to pH changes than chymosin. Besides pH, also other parameters like temperature and water content may affect the protease specificity. It is well known that most coagulants show a changing substrate specificity with changing pH, resulting in altered proteolytic activity in later stages of the cheese making process. It is also well known that coagulants differ in the extent of casein proteolysis; they may also show differences in the peptide patterns produced during proteolysis. These are relevant factors during cheese ripening and may affect cheese properties like taste, flavor and texture. In some cases coagulants give rise to undesired effects like the formation of bitter tasting peptides or off-taste. In addition, changes in proteolytic specificity may lead to a reduction in yield. Pepsin, a well known component in many bovine chymosin preparations, is an example of a protease that gives rise to lower yields and taste effects as compared to pure chymosin. There is still a need for coagulants with give rise to new, improved cheese texture and taste. Such new coagulants result in the accelerated development of taste and texture profiles related to cheese aging, therewith providing a substantial economical benefit.

It is well known that free amino acids are very important in taste and flavour generation. Especially the amino acids leucine, phenylalanine, methionine and valine play an important role in the generation of typical cheese taste and flavor components. The free amino acids are converted via fermentation by micro organisms that are added during the cheese manufacturing process into the actual flavor and taste generating compounds like methanediol, dimethyldisulphide, methylpropanoic acid and methylpropanal. Exo-peptidases play an important role in the generation of free amino acids. They can only be effective, however, when they are combined with an endo-protease of appropriate specificity. Appropriate combinations of exo- and endo-peptidases can be used in cheese making, resulting in the manufacture of cheeses with new and improved taste profiles.

The enzymes according to the invention may be used to hydrolyze proteinaceous materials of animal origin such as whole milk, skim milk, casein, whey protein or mixtures of casein and whey protein. Such mixtures of casein and whey protein may be used, for example, in ratios similar to those found in human milk. Furthermore, the enzyme mixture according to the invention may be used to hydrolyze proteinaceous materials of plant origin such as, for example, wheat gluten malted or unmalted barley or other cereals used for making beer, soy milk, concentrates or isolates thereof, maize protein concentrates and isolates thereof, and rice proteins.

Within the area of large scale industrial processes, some applications rely on the use of endoproteases only whereas in other applications combinations of endoproteases with exoproteases are essential. Typical examples which depend on the use of pure endoproteases and in which the protease according to the invention can deliver a superior performance include applications like the processing of soy or peas or cereals proteins aimed at minimising viscosities or optimising foaming or other physics-chemical characteristics, bread improvers in the bakery industry also aimed at diminishing the viscosity of doughs, processing aids in the beer and wine industry aimed at the prevention of protein hazes or optimising the extraction yields of cereals, feed additives in the bio industry aimed at enhancing intestinal absorption or modulating microbial activities in the gut, processing aids in the dairy industry aimed at optimising the clotting, foaming or setting characteristics of various milk components. Moreover, v For specific market segments proteins derived from milk or soy or collagen are exposed to proteases to produce socalled protein hydrolysates. Although the main outlets for these protein hydrolysates are infant formula and food products for hospitalised persons, products intended for persons with non-medical needs, such as athletes or people on a slimming diet form a rapidly growing segment. In all of these applications protein hydrolysates offer attractive advantages such as lowered allergenicities, facilitated gastrointestinal uptake, less chemical deterioration of desirable amino acids like glutamine and cystein and finally, absence of proteinaceous precipitations in acid beverages during prolonged storage periods. All these advantages can be combined if the hydrolysate is offered as a mixture of di- and tripeptides. However, currently all commercially available hydrolysates are produced by combining several endoproteases. The latter approach implies a non-uniform and incomplete degradation of the protein. To obtain the desired mixture of di- and tripeptides, a hydrolysis process involving a combination of various di- and tripeptidylpeptidases would be ideal. Unfortunately, only few of these enzymes from food grade and industrially acceptable microorganisms are known, let alone industrially available. According to the invention several of highly useful di- and tripeptidylpeptidases are economically obtainable in a relatively pure state. Preferred are those di- or tripeptidylpeptidases that exhibit a low selectivity towards the substrate to be cleaved, i.e. exhibit minimal amino acid residue cleavage preferences only. Preferred are combinations of those di- or tripeptidylpeptidases that hydrolyse high percentage of the naturally occurring peptide bonds. Despite this high activity to naturally occurring peptide bonds, a total hydrolysis to free amino acids is prevented by the nature of the di- and tripeptidylpeptidases. Also preferred are those di- or tripeptidylpeptidases that are optimally active between pH 4 to 8 and exhibit adequate temperature stability. Adequate temperature stability implies that at least 40%, preferably at least 60%, more preferably between 70 and 100% of the initial hydrolytic activity survives after heating the enzyme together with the substrate for 1 hour at 50 degrees C.

Although the process towards an efficiebnt production of mixtures di-or tripeptides or di- and tripeptides hinges on the availability of the enzymes according to the invention, the first enzyme incubation with the proteinaceous substrate will usually be an endoprotease. Preferably an endoprotease with a broad spectrum endopeptidase suited for the situation, e.g. subtilisin (Delvolase from DSM), neutral metallo protease (Neutrase from NOVO) or thermolysin (Thermoase from Daiwa Kasei) for the near neutral conditions and pepsin or aspergillopepsin (e.g. Sumizyme AP from Shin Nihon, Japan) for the acidic conditions. Aim of this first digestion is to improve the solubility, to reduce the viscosity and to reduce the heat setting characteristics of the water/protein mixture. Furthermore this pretreatment with an endonuclease is essential to create enough starting points for the di- and tripeptidylpeptidases hereby accellerating the proces of di- or tripeptide formation. Optionally a protease intended for debittering of the hydrolysate can be included in this stage of the process or later, together with the di-or tripeptidylpeptidases.

Main aim of the latter hydrolysates is to minimize the allergenicity of the product or to facilitate gastrointestinal uptake. In the production of such hydrolysates the use of dipeptidyl- and tripeptidyl-peptidases is of special importance as these s offer an efficient way for producing hydrolysates.

Other applications in these food and feed industries totally rely upon combinations of one or more endoprotease(s) with one or more exoprotease(s). Such combinations of an endoprotease with an exoprotease are typically used in industries to improve aspects like taste and colour of the final product. The reason for this is that the development of taste and colour is largely dependent upon the presence of free amino acids. Free amino acids can not only be obtained by exoproteases such as carboxypeptidases and aminopeptidases but also by peptidyl-dipeptidases. If combined with endoproteases or even dipeptidyl-or tripeptidyl-peptidases, carboxypeptidases, aminopeptidases and peptidyl-dipeptidases can create larger quantities of free amino acids in less time. However, in all of these processes an uncontrolled release of amino acids or even non-proteinaceous components should be avoided to minimise undesirable side reactions.

Though free amino acids as such, can elicit a number of taste impressions, these taste impressions are very basic (bitter, sweet, sour and “umami”) and the amino acid concentration required for perceiving these tastes are high. Despite these high threshold values, free amino acids are able to create major sensory effects at much lower concentration ranges through a number of flavour enhancing mechanisms. One of these mechanism involves the combination of free amino acids with sugars in so-called Maillard reactions. Compared with free amino acids, with these Maillard products overwhelmingly complex flavour and odour systems can develop with threshold values that are several orders of magnitude lower than those recorded for the free amino acids. Maillard products are formed at elevated temperatures usually during cooking, baking or roasting when preparing food or feed products. During these treatments both colour and a large array of aromas develop. In these reactions amino groups react with reducing compounds as a first step and ultimately leading to a whole family of reaction pathways. In foods or feeds the amino compounds involved are predominantly free amino acids which are released from the proteinaceous raw material by various proteases and the required reducing compounds primarily represent reducing sugars. The implication is that during the processsing of the raw material undesired release of free amino acids and sugars should be avoided to minimise off tastes that could be generated during subsequent heating steps as e.g. during spray drying or sterilisation. The latter notion emphasises once more the benefits of superior purity and low in-use costs of the enzyme according to the invention.

Apart from Maillard reactions, amino acids can also undergo important chemical transitions at ambient temperatures. The latter type of transitions are enzyme dependent and are quite common in fermented foods such as beer, yogurt, cheese ripening and meat and wine maturation processes. In these fermentation processes, free amino acids are liberated from the raw materials used by the proteases added or by proteolytic enzyme activity from the raw material or the microbial starters used. During the maturation phase microbial metabolic activity then converts the free amino acids into derivatives with increased sensoric properties. For example, L-leucine, L-isoleucine and L-valine lead to the formation of valuable fusel alcohols like amylalcohols and isobutanol in beer fermentation. Similarly cheese volatiles such as methanethiol and dimethyldisulphide have been traced back to the occurrence of methionine in cheese as well as methylpropanoic acid and methylpropanal to valine. Finally the free amino acid glutamate and can create strong savoury enhancing effects because of its synergy with the breakdown products of RNA, so-called 5′-ribonucleotides. If combined with proper concentrations of 5′-ribonucleotides such as 5′-IMP and 5′-GMP, the detection threshold of the umami taste generated by glutamate is known to be lowered by almost two orders of magnitude.

In order to obtain pronounced and precise taste effects in all of these processes, the proteinaceous substrates should be hydrolysed using a combination of an endo- and an exoprotease, wherein at least one of the endo or exoprotease, preferably both the endo- and exoprotease, are pure and preferably selective towards a specific set of amino acid(s) or preferentially release the preferred amino acid(s). So preferred proteases are characterised by a high selectivity towards the amino acid sequences that can be cleaved which notion makes the enzyme category in Aspergillus known as “maturases” of particular importance.

Apart from the food and feed industries, proteases are also commonly applied by the chemical, pharmaceutical, diagnostic and personal care industries.

In the personal care industry proteases are used to create peptides which are added to a variety of products to improve aspects like skin feel, gloss or protection. Moreover there is a new tendency towards direct topical application of the protease. Very similar to the enzyme use in the leather industry, the prime aim in the latter application is to clean, dehair and soften the skin.

In the chemical and pharmaceutical industry proteases are being developed as valuable tools in producing costly ingredients or intermediates. In these industries proteases are not only used because of their hydrolytic capacity but also because of their capacity to synthesise peptides from natural or non-natural amino acids. The latter option is clearly demonstrated by the possibility to synthesize aspartame from its amino acid based building blocks by using an endoprotease like thermolysin.

Unlike the situation in the food and feed industry, the stereo- and regioselectivity of proteases are also considered important assets although unusual reaction conditions may be needed to accomplish the desired chemical transformation. Typical examples of the application of proteases in this industry include the use of endoproteases, aminopeptidases as well as carboxypeptidases in the production of various intermediates for drugs like insulin, antibiotics, renin and ACE-inhibitors An overview of such uses is presented in Industrial Biotransformations, A. Liese, K. Seelbach, C. Wandrey, Wiley-VCH; ISBN 3-527-30094-5.

In view of the desired specificities, stereo- and regioselectivities, the absence of side activities and resistance to unusual reaction conditions such as high solvent concentrations, the improved performance of the protease according to the invention offers substantial advantages.

From a pharmaceutical point of view the role of proteases is illustrated by a substantial number of references in Martindale's, “The Extra Pharmacopoeia” (Pharmaceutical Press, London, UK). Moreover the important role of very specific proteases in regulating all kinds of biological processes is illustrated by the fact that many hormones become active only after the processing of an, mostly inactive, precursor molecule by such a very specific protease. Inhibitors active towards certain categories of such specific proteases have been implicated in the development of all kinds of new drugs. Therefore new and effective inhibitors for protease may now be identified using the sequences provided herein.

The entire disclosure of each document cited herein is hereby incorporated by reference

TABLE 1 SEQ ID number Gene cDNA Protein Function of encoded protein EC number 1 58 115 Pepsin A₃ EC3.4.23.1 2 59 116 Metalloprotease EC3.4.24.56 3 60 117 acylaminoacyl-peptidase EC3.4.19.1 4 61 118 Tripeptidylaminopeptidase EC3.4.14.- 5 62 119 serine carboxypeptidase EC3.4.16.6 6 63 120 Serine endoprotease EC3.4.21.- 7 64 121 Carboxypeptidase Y EC3.4.16.5 8 65 122 aspergillopepsin II - hom EC3.4.23.19 9 66 123 Tripeptidyl peptidase EC3.4.14.9 10 67 124 Tripeptidyl peptidase EC3.4.14.9 11 68 125 aspergillopepsin II - hom EC3.4.23.19 12 69 126 Tripeptidyl peptidase EC3.4.14.9 13 70 127 Metalloprotease EC3.4.24.- 14 71 128 aspergillopepsin I EC3.4.23.18 15 72 129 Pepsinogen E EC3.4.23.25 16 73 130 aspergillopepsin I - hom EC3.4.23.18 17 74 131 aspergillopepsin II EC3.4.23.19 18 75 132 Pyro-Glu peptidase EC3.4.19.3 19 76 133 dipeptidyl peptidase EC3.4.14.2 20 77 134 Secr. aminopeptidase EC3.4.11.10 21 78 135 alkaline D-peptidase EC3.4.16.4 22 79 136 Carboxypeptidase EC3.4.16.1 23 80 137 Carboxypeptidase EC3.4.16.1 24 81 138 Carboxypeptidase-II EC3.4.16.1 25 82 139 aspartic proteinase EC3.4.23.- 26 83 140 Tripeptidyl peptidase EC3.4.14.9 27 84 141 Carboxypeptidase EC3.4.16.1 28 85 142 cysteine proteinase EC3.4.22.- 29 86 143 Metallocarboxypeptidase EC3.4.17.- 30 87 144 Subtilisin hom. EC3.4.21.62 31 88 145 Carboxypeptidase Y EC3.4.16.5 32 89 146 Metalloprotease EC3.4.24.- 33 90 147 Carboxypeptidase Y EC3.4.16.5 34 91 148 Metalloprotease EC3.4.24.- 35 92 149 Tripeptidyl peptidase EC3.4.14.9 36 93 150 Aspartic protease EC3.4.23.24 37 94 151 Aspartic protease EC3.4.23.24 38 95 152 Pepsin A₃ EC3.4.23.1 39 96 153 Aspartic protease EC3.4.23.24 40 97 154 Aspartic protease EC3.4.23.24 41 98 155 Kex EC3.4.21.61 42 99 156 Serine protease EC3.4.21.- 43 100 157 Glutamyl endoprotease EC3.4.21.82 44 101 158 aspergillopepsin II - hom EC3.4.23.19 45 102 159 acylaminoacyl-peptidase EC3.4.19.1 46 103 160 Tripeptidylaminopeptidase EC3.4.14.- 47 104 161 serine carboxypeptidase EC3.4.16.6 48 105 162 Gly-X carboxypeptidase EC3.4.17.4 49 106 163 aspartic proteinase EC3.4.23.- 50 107 164 Tripeptidyl peptidase EC3.4.14.9 51 108 165 Carboxypeptidase-I EC3.4.16.1 52 109 166 serine carboxypeptidase EC3.4.16.6 53 110 167 serine carboxypeptidase EC3.4.16.6 54 111 168 Secr. aminopeptidase EC3.4.11.10 55 112 169 Prolyl endopeptidase EC3.4.21.26 56 113 170 aspergillopepsin I - hom EC3.4.23.18 57 114 171 Aminopeptidase EC3.4.11.-

EXAMPLES Example 1

Assaying Proteolytic Activity and Specificity

Protease specificity may be explored by using various peptide substrates. Synthetic substrates are widely used to detect proteolytic enzymes in screening, in fermentation, during isolation, to assay enzyme activity, to determine enzyme concentrations, to investigate specificity and to explore interaction with inhibitors. Peptide p-nitroanilides are preferably used to assay protease activity as the activity can be followed continuously and therefore allow for kinetic measurement. The cleavage of peptide p-nitroanilides can be followed by measuring the increase in adsorption at 410 nm upon release of the 4-nitroanilide. Paranitroanilide substrates are generally used for serine and cysteine proteases. In addition peptide thioesters and 7-amino-p-methylcoumarin peptide derivates are used. Peptide thioesters are very sensitive substrates for serine and metalloproteases that exhibit relatively high turnover rate since the thioester bond is easier to cleave than the amide bond. Cleavage of thiolesters may be followed with a thiol reagent such 4,4-dithiopyridine (324 nm) or 5,5-dithiobis 2-nitrobenzoic acid (405 nm). The same increased turnover rate is usually observed for the cleavage of ester bonds relative to amide bond. The most well known substrates to assay the esterase activity of proteases are p-nitrophenol derivates. The release of p-nitrophenol can be monitored at different wavelength dependent on the pH that is used, eg around neutral pH a wavelength of 340 nm is used while above pH 9 monitoring is done around 405 nm. In addition the hydrolysis of esters can also be followed by titration using pH-stat equipment. In case of qualitative measurement of esterase activity pH sensitive dyes can be applied.

As an alternative, peptides may be attached to a fluorescent leaving group. Proteolysis is accompanied by an increase in fluorescence when monitored at the appropriate wavelengths. Peptidyl 2-naphtylamides and peptidyl 4-methyl-7-coumarylamides are commonly used. The release of for example 7-amino-4 methylcoumarin is measured using an excitation wavelength of 350 nm and an emission wavelength of 460 nm. The use of 7-amino-4 trifluoromethylcoumarin has the advantage of the leaving group being both chromogenic (absorbtion 380 nm) as well as flourogenic (excitation 400 nm, emission 505 nm). When it is essential that at both sides of the scissile bond an amino acid is present, the introduction of a group that quenches the fluorescence might be useful. The general characteristics of such substrates is that the peptide sequence separates a fluorescent donor group from an acceptor group that acts as a quencher of fluorescence. Cleavage of a peptide bond between the quenching group and the fluorophore will lead to substantial increase in fluorescence. Several donor-acceptor pairs have been reported, including o-aminobenzoic acid (Abz) as the donor and 2,4 dinitrophenyl (Dnp) as the acceptor, 5-[(2′aminoethyl)-amino]naphtalenesulfonic acid (EDANS) as the donor and 4-[[4′-(dimethylamino]phenyl]azo]-benzoic acid (DABCYL) as the acceptor. The Abz/EDDnp represents a very convenient donor-aceptor pair since after total hydrolysis, the fluorescence increases by a factor 7 to 100 and the absorption spectrum of EDDnp does not change with pH. Moreover, the peptide sequence may contain up to 10 residues without loss of the quenching effect. As the size of the connecting peptides increases, the position of the scissile bond may become less specific. Therefore in addition to establishing whether proteolysis occurred, additional analysis of the products may be required. This may be done by analysing and separating the produced peptides by HPLC and determining the amino acid sequence of the fragments. In addition the peptide composition of the digest may be directly analysed by using combined HPLC/mass-spectroscopy technique.

Apart from using peptides of a defined sequence also synthetic peptide libraries can be used to study protease specificity. Peptides are synthesised by solid phase synthesis in random or semi-random fashion. E.g. Meldal et al. (PNAS USA 91,3314,1994) report the preparation of a family of protease substrates by starting with H-Lys(Abz)-resin, extending the resin with peptides to a length of six amino acids, and finally coupling Tyr(NO2) to the peptides. Each resin bead has a unique sequence and on treatment with the proteases the most susceptible becomes fluorescent as the Tyr(NO2) containing peptide is released. Sequence analysis of the peptides on the susceptible will give information on the specificity of the protease.

Protease activity is usually expressed in units. Generally the international standard unit (IU) is defined as the amount of enzyme, which under defined conditions transfers one micromole of substrate per minute. Specifically with proteases the IU would relate to the hydrolysis of one micromole peptide bond per minute. However in the case of protease units deviations of the international definition are more rule than exception. Where with the model peptides, which are cleaved specifically at one bond the calculation of IU's is strait-forward, for proteinacious substrates where the protease can cleave at various positions to a various degree many deviating unit definition are used. Apart from a definition of the unit used, any hydrolysis experiment requires an adequate description of the conditions under which the units are measured. Such conditions comprise e.g. the substrate concentration, the enzyme-substrate ratio, the pH and temperature. Typical assays for determining the specific activity of a proteases comprise a proteinacious substrate such as for example denaturated hemoglobin, insulin or casein. The polypeptide substrate is digested by a protease at fixed conditions during a fixed time interval. Undigested and large polypeptides are precipitated with TCA and TCA soluble product is determined by measuring absorbance at 220 or 280 nm, or by titrating the soluble peptides with folin reagent, ninhydrin, fluro 2,4, dinitrobenzene/dansylchloride, TNBS method or fluorescein. Instead of labeling the product after hydrolysis, also polypeptide substrates may be used which are already labeled by specific dyes or fluorophores such as for example fluorescein. In addition standard methods of amino acid analysis may be applied using standard laboratory analyzers. In order to hget insight in the size distribution of the peptides generated by a protease, gel chromatography experiments may be performed. In addition to this HPLC using reverse phase techniques is applied in order to get better resolution of the peptide patterns generated by the protease. The course of the hydrolysis of proteinacious substrates is usually expressed in the degree of hydrolysis or DH. In case pH-stat is used to follow the course of hydrolysis, DH can be derived from the base consumption during hydrolysis (Enzymatic Hydrolysis of Food Protein, J. Adler-Nissen, 1986, Elsevier Apilied Science Publishers LTD). The DH is related to various useful functional properties of the hydrolysate such as solubility, emulsifying capacity, foaming and foam stability, whipping expansion, organoleptic quality. In addition taste is an important aspect of food grade hydrolysates. Bitterness can be a major problem in protein hydrolysates. Termination of the hydrolysis reaction may be done by changing the pH, heat inactivation, denaruring agents such as SDS, acetonitril etc.

Polypeptides shown in Tabel 1 were expressed and at least partially purified according to standard procedures known in the art. They were analysed according to al least one of the methods described above and found to have the activities listed in Table 1.

Example 2

Direct Determination of the kcat/Km Ratio for Protease Substrates.

Synthetic substrates can be used to monitor the enzymatic activity during purification, to determine enzyme concentration, to determine inhibition constants or to investigate the substrate specificity. Determination of the kcat/Km ratio gives a measurement of the substrate specificity. It allows to compare the specificity of different substrates for a same enzyme or the comparison of hydrolysis rates with different enzymes cleaving the same substrate. This ratio has a unit of a second order rate constant and is then expressed as 1/(concentration·time). Substrates having a kcat/Km ratio in the range 10.5-10.6 M−1·sec−1 are considered to be very good substrates i.e good affinity and rapid turn-over. However, some substrates may be very specific with kcat/Km values in the 10.4 M−1·sec−1 range.

The kcat/Km ratio may be calculated after determination of individual parameters. In that case, Km and Vm may be obtained from various linear plots (e.g Hanes or Cornish-Bowden method) or by a non-linear regression method. Knowing that Vm=kcat·Et (where Et is the final active enzyme concentration then kcat=Vm/Et·Determination of the kcat/Km ratio by the previous method may be prevented when product or substrate inhibition occur, or when Substrate precipitates at high concentration. It is however possible to obtain an accurate value of the kcat/Km ratio working under first-order conditions i.e at a substrate concentration far below the estimated Km. In these conditions, the Michaelis-Menten equation: v=(Vm·S)/(Km+S) becomes: v=(Vm·S)/Km since S<<Km or v=(Vm/Km)·S=kobs·S=−dS/dt which integrates as InS=−kobs·t+InSo where So is the starting substrate concentration and S the substrate concentration at a given time. The velocity is proportionnal to the substrate concentration. In other words, the substrate hydrolysis obeys a first order process with kobs as the first-order rate constant. kobs=Vm/Km=(kcat·Et)/Km since Vm=kcat·Et

A continuously recording of the substrate hydrolysis will allow the graphical determination of kobs from the InS vs time graph. The kcat/Km ratio is simply inferred from kobs providing the active enzyme concentration is known: kcat/Km=kobs/Et

Assay method: Use a starting substrate concentration far below the estimated Km and a low enzyme concentration to allow the substrate hydrolysis to be recorded. You will obtain a first-order curve for the product generation:

After total hydrolysis of the substrate, the absorbance (or fluorescence units) of the product will allow the accurate determination of So, since Pt=So. kobs is determined from the slope of the InS vs time graph or alternatively using a fitting software (Enzfitter, SigmaPlot . . . ).

NB: Do not forget to calculate the substrate concentration for any given time from the product concentration (S=So−P) since plotting P vs time would not provide the correct kobs (dP/dt=kobs·S does not integrate in the same way).

Alternatively, one can measure successive t½ (half-time) from the product apparition curve since in a first order process: t½=In2/kobs=0.693/kobs then kobs=0.693/t½

Using this method allows to check that you have a true first order decay (identical values for the successive t½).

Example 3

Inactivating Protease Genes in Aspergillus

The most conveniant way of inactivating protease genes in the genome of Aspergillus is the technique of gene replacement (also called “one step gene disruption”). The basics of this technique have been described by Rothstein R J in Meth. Enzymol. 101, p202, 1983. Essentially the technique is based on homologous recombination of transformed DNA fragments with the genomic DNA of a fungal cell. Via double crossover the gene to be inactivated is (partly) replaced by the DNA fragment with which the cell is transformed. Preverably the transformed DNA fragment contains a selectable marker gene for Aspergillus niger. Basically the manipulation of DNA and generation of a inactivation construct are done using general molecular biological techniques. First, genomic DNA is isolated from the Aspergillus niger strain that is later on used for the inactivation of the protease gene. Genomic DNA of A. niger can be isolated by any of the techniques described, e.g. by the method described by de Graaff et al. (1988) Curr. Genet. 13, 315-321, and known to the person skilled in the art. This genomic DNA is used as template for amplification of the flanking regions of the protease gene by using the polymerase chain reaction (PCR; Sambrook et al. (1989) Molecular cloning, a laboratory manual, 2nd edition, Cold Spring Harbor Laboratory Press, New York). With flanking regions is meant here the non-coding regions upstream and downstream of the protease gene that will be inactivated. Preferably the flanking regions should each be more than 1.0 kb in length.

Two single stranded DNA oligonucleotides are used for the priming of the PCR amplification of each flanking region. For the 5′-flanking region, one primer is homologous to a DNA sequence upstream of the start of the coding sequence of the protease gene. Preferably the homologous region is located more than 1.0 kb upstream of the translation start site. The second primer is homologous to the complementary and inverse DNA sequence located immediately upstream of the coding sequence of the protease gene.

For the 3′-flanking region, one primer is homologous to the DNA sequence immediately downstream of the coding sequence of the protease gene. The second primer is homologous to a complementary and inverse DNA sequence located preferably more than 1.0 kb downstream of the coding sequence of the protease gene.

The DNA sequence included in all primers and homologous to the A. niger genome should be minimally 15 nucleotides in length, preferably more than 18 nucleotides in length. Most conveniently, all primers should contain a DNA sequence coding for the recognition site of suitable restriction enzymes upstream of the sequence that is homologous to the A. niger genome. These extra recognition sites facilitate the cloning process.

Both primers and the genomic DNA of A. niger are used in a PCR reaction under conditions known to those skilled in the art. The annealing temperature of the primers can be calculated from the part of the DNA sequence that is homologous to the A. niger genome. Both fragments containing the 5′-flanking region and the 3′-flanking region are cloned into a vector that can be propagated in E. coli using general molecular biological techniques. A gene that can be used as selection marker in Aspergillus niger is then cloned in between the two flanking regions. Most conveniantly the marker gene is under control of a promoter that comes to expression in A. niger, preferably an endogenous A. niger promoter. The orientation of the insertion of the marker gene is preferably in the same direction as the original protease gene. The final inactivation fragment contains the 5′-flanking region, a selection marker gene preferably under control of a A. niger endogenous promoter, and the 3′-flanking region, all in this direction and orientation. DNA of the final construct is cloned into a vector that can be propagated in E. coli.

The inactivation construct is digested with suitable restriction enzymes to remove the E. coli vector sequences and the inactivation fragment is isolated using standard techniques (Sambrook et al. (1989) Molecular cloning, a laboratory manual, 2nd edition, Cold Spring Harbor Laboratory Press, New York). Finally Aspergillus niger is transformed with the inactivation fragment using a method described in literature, e.g. by the method described by Kusters-van Someren et al. (1991) Curr. Genet. 20, 293-299. Transformed cells are selected by plating the transformation mixture on agar plates that are selective for growth of Aspergillus niger strains that do express the marker gene. After purification of the transformed Aspergillus strains by replica plating, a representative number of strains is analysed by Southern blotting using standard methods (Sambrook et al. (1989) Molecular cloning, a laboratory manual, 2nd edition, Cold Spring Harbor Laboratory Press, New York). Therefore, genomic DNA of mycelium of transformed strains is isolated and digested with suitable restriction enzymes. Restriction fragments are separated using agarose gel electrophoresis, blotted to nitrocellulose membranes and probed with a labeled fragment of the marker gene. Hybridization and washing is under stringent conditions. Strains that contain labeled restriction fragments of the correct length are considered correct.

Using this method A. niger strains can be selected with an inactivated protease gene of choice.

Example 4

Isolating Proteases by Ion Exchange Chromatography

Small quanties of the protease encoded by the nucleotide sequence as provided herein are obtained by constructing an expression plasmid containing the relevant DNA sequence, transforming an A. niger strain with this plasmid and growing the A. niger strain in a suitable medium. After collecting the broth free of contaminating cells, the protease sought can be purified.

To isolate the protease as encoded by the provided nucleotide sequence in an essentially pure form several strategies can be followed. All of these strategies have been adequately described in the relevant scientific literature (see for example the Protein Purification Handbook, 18-1132-29 Edition AA as published by Amersham Pharmacia Biotech, Uppsala, Sweden). A procedure which is applicable to purify proteases from complex mixtures is provided hereunder. Essential is that a suitable assay is available that is selective towards the enzyme characteristics sought. For proteases typically a chromogenic, synthetic peptide substrate is used as described in Example 1. Such peptide substrates can be selective towards endoproteases, carboxypeptidases, aminopeptidases or omegapeptidases. In Example 11 the selectivity towards a specific tripeptidylpeptidase is described. By choosing the right amino acid residues in the relevant synthetic peptide, proteases with the desired specificity can be selected.

First it should be determined whether the protease is excreted into the medium, depending on the expression system chosen to produce the protease, it may be excreted or contained in the cell. If the protease is excreted into the fermentation medium, the producing cells or fragments of these cells have to be removed by centrifugation or filtration and the resulting clear or clarified medium is the starting point for further purification. In those cases in which the protease sought is not excreted, the producing cells have to be disrupted to enable purification of the protease. In such cases the collected cell mass is best ground with an abrasive, milled with beads, ultrasonicated or subjected to a French press or a Manton-Gaulin homogeniser and then filtered or centrifuged. In case the protease is hydrophobic or membrane bound, the addition of a non-ionic detergent to solubilise the protease before the filtering or centrifugation step may be necessary.

After the clarification step, a three phase purification strategy can be applied to obtain the unknown proteases in an essentially pure state. In all or some of these three phases addition of a detergent may be necessary.

In the first or capture phase the target protease is isolated, partly purified and concentrated. During the subsequent intermediate purification phase most of the bulk impurities are removed and in the final polishing phase trace amounts of remaining impurities of larger amounts of closely related substances are removed and the enzyme is dissolved in the desired buffer. Depending upon the nature and physical properties of the protease at hand, a person skilled in the art is capable of optimising the three phases using slightly modified versions of the different protein binding materials and apply these under somewhat changed conditions. However, in all cases a selective analytical assay is indispensible as it will enable the continuous monotoring of the increasingly purified proteolytic activity. Analytical assays suitable for the purpose include the use of chromogenic peptide substrates as has been mentioned before.

In the first capturing phase of the purification a strong ion exchange resin of the anionic type is preferably used to apply the clarified and desalted enzyme containing medium. To guarantee binding of the desired proteolytic activity to the resin, three or four different pH values of medium and resin are tested under low conductivity conditions. In these tests the resin is always equilibrated with a buffer of the same pH value and conductivity as the enzyme containing medium. The medium is then applied to the column under pH conditions which has been shown to allow adequate binding of the protease to the resin i.e. none of the desired enzymatic activity can be traced back in the run-through medium. Subsequently the desired enzymatic activity is eluted from the ion exchange resin using a continuous salt gradient which starts with the resin equilibration buffer and ends with this buffer to which 1 molliter of NaCl has been added. Eluted fractions containing the desired activity according to the assay are pooled and then prepared for an additional purification step. This additional purification step depends on the purity of the desired enzyme in the pooled fraction: if almost pure, an additional gel filtration step will proof to be adequate; if not almost pure, chromatography over a hydrophobic interaction resin is applied followed by a gel filtration step.

Chromatography over a hydrophobic interaction resin is carried out by first increasing the salt content of the pooled fraction obtained from the ion exchange resin to 4 mol/liter of NaCl and by removing any precipitate formed. If the resulting clear fraction does not contain the desired activity, this activity is obviously present in the precipitate and can be recovered in an essentially pure state. If the resulting clear fraction still exhibits the desired activity in the assay, then the liquid is applied as such to a phenyl sepharose resin (Pharmacia) equilibrated in this high salt buffer with an identical pH and conductivity. If the desired enzymatic activity binds to the phenyl sepharose resin, the activity is eluted with a continuous gradient of decreasing salt content followed by a salt free wash and, if nesessary, with a chaotropic agent. Like before those fractions from the gradient that exhibit activity in the assay are pooled and finally subjected to a gel filtration step. If the desired enzymatic activity does not bind to the phenyl sepharose resin, many of the contaminants will, so the desired proteolytic activity as present in the void volume of the column requires only an additional ultrafiltration step to obtain the activity in a more concentrated form before applying it to the gel filtration column. The gel filtration column does not only remove trace contaminations but also brings the enzyme in the buffer which is required by subsequent use.

Although this method is generally applicable for the isolation and purification of proteases according to the invention, a more specific isolation technique is described in Example 4. In that Example the isolation of an Aspergillus protease is described by using immobilised bacitracin, a peptide antibiotic known for its selective interaction with various types of proteases.

Example 5

Isolating Proteases by Affinity Chromatography

An alternative method for purifying small quantities of protease is by affinity chromatography. To obtain the protease in a purified form, a 100 milliliter culture is grown in a well aerated shake flask. After centrifugation to remove any non-soluble matter, the supernatant is applied to a 40 milliliter bacitracin-Sepharose column equilibrated with 0.05 mol/liter sodium acetate pH 5.0. Proteases bound to the column are eluted using the acetate buffer supplemented with 1 mol/liter of NaCl and 10% (v/v) isopropanol (J. Appl. Biochem., 1983 pp420-428). Active fractions are collected, dialysed against distilled water and applied on a 20 milliliter bacitracin-Sepharose column, again equilibrated with acetate buffer. As before, elution is carried out using the acetate buffer supplemented with NaCl and isopropanol. Active fractions, i.e. fractions displaying the activities sought, are collected, dialysed against a 5 millimol/liter acetate buffer pH 5.0 and then concentrated by means of ultrafiltration with a Amicon PM-10 membrane. To obtain the protease in an essentially pure state, the concentrated liquid is chromatographed over a Superdex 75 column equilibrated with the 0.05 mol/liter sodium acetate buffer pH 5.0 and supplemented with 0.5 mol/liter NaCl.

Further experiments carried out with the purified enzyme on PAGE may confirm if the molecular weight is in line with what can be expected on the basis of the available sequence data. Final confirmation can be obtained by carrying out a partial, N-terminal amino acid analysis.

Example 6

Properties of a Novel Cysteine Protease from A. niger.

In this Example Aspergillus gene nr 28 was cloned and overexpressed in A. niger as described before. The enzyme obtained was purified according to procedures described in Example 4 and used to destroy trypsin inhibiting activity from soybeans under various conditions. As reference materials papain and bromelain were used. Bromelain was obtained from Sigma, papain was obtained from DSM Food Specialties Business Unit Beverage Ingredients, PO Box 1, 2600 MA Delft, the Netherlands.

Trypsin inhibition was measured according to the method of Kakade, M. L., Rackis, J. J., McGhee, J. E. and Puski, G. (1974): J. Cereal Chemistry 51: 376-382.

Degradation of the substrate N-benzoyl-L-arghinine-p-nitroaniline to N-benzoyl-L-arginine and p-nitroaniline was taken as a measure of trypsin activity. Trypsin was obtained from British Drug Houses Ltd and was derived from cow's pancreas containing more than 0.54 Anson Units per gram of product.

The Kunitz inhibitor for soybeans was also obtained from Sigma.

The trypsin inhibitor was pre-incubated at a concentration of 2 mg/ml with the above mentioned cysteine protease enzymes at pH 3 in 50 mM Na-acetate buffer prior to measuring trypsin inhibition. Enzymes were added at a ratio of enzyme protein to trypsin inhibitor of 1:100 (w/w). Albumin served as a negative control for the enzymes. Remaining trypsin activity was measured after incubation during 3 hours at 37° C. Results are shown in Table 2.

TABLE 2 Effects of various cysteine proteases on the enzymatic inactivation of the Kunitz trypsin inhibitor from soybeans. 2 3 4 5 Remaining Remaining TI Remaining TI Remaining TI 1 TI activity after activity after activity after Enzyme activity pepsin heat treatment heat treatment tested (%) treatment at 75° C. at 90° C. Papain 25 55 78 95 Bromelain 30 62 86 99 A. niger 26 26 28 35 Albumin 100 100 100 100 (control) TI: Trypsin Inhibitor activity

Experiments were repeated in the presence of pepsin during the pre-incubation of cysteine proteases with the trypsin inhibitor. Pepsin was added at final concentration of 1.3 mg/ml. Results are shown in column 3.

Another series of experiments were conducted to check for heat stability. The cysteine proteases were incubated at 75 and 90° C. during 5 minutes prior to the addition of these enzymes to the pre-incubation with the trypsin inhibitors. Results are shown in columns 4 and 5.

These results clearly demonstrate the superior activity of these novel cysteine proteases from Aspergillus niger over currently available cysteine proteases for the inactivation of trypsin inhibitors in animal feed.

Example 7

Exo-Peptidases Promoting Cheese Ripening and Cheese Taste.

The amino-peptidases encoded by genes nr 20 and 54 (see Table 1) were overexpressed in A. niger according to methods described earlier. Purification of these enzymes was carried out according to procedures as described in Example 4. The activity of the purified enzyme samples was determined at pH7.2 in an aqueous phosphate buffer (50 mM) containing the para-nitro anilide derivative of a number of hydrophobic amino acids (3 mM) as the substrate. The conversion of the substate by the amino peptidase was determined by monitoring the change in optical density at 400 nm as a result of substrate conversion, using a solution not contaning the enzyme as the reference. Activity (A) was calculated as the change in OD per minute and expressed as e.g. Phe-AP, Leu-AP or Val-AP units, depending on the substrate used. Normal cheese milk was inoculated with starter culture of the Delvo-tec™ DX 31 range (DSM Food Specialities Delft, The Netherlands) to obtain a Gouda-type cheese and coagulating was executed with an average dosis of coagulant (50 IMCU per liter of cheese milk). In addition, 25 Phe-units of each exo-protease was added to two experimental cheeses whereas the control did not contain either one of the exo-proteases. Cheese making parameters were used conform the procedure applied for semi-hard cheese for both cheeses. A difference was noted in terms of flavor and aroma development between the experimental cheeses and control cheese to such an extent that the experimental cheeses has obtained most of its organoleptical properties after three (3) weeks whereas the control cheese has obtained a similar qualification after six (6) weeks. The level of free amino acids after three weeks was shown to be twice as high in the experimental cheeses; after six weeks of ripening the levels were comparable again. Amino acid analysis was carried out according to the Picotag method of Waters (Milford Mass., USA).

These data suggests that the product is ready for sale three weeks earlier without decreasing the keeping quality of the cheese. The organoleptic character of the experimental cheeses differed from the control to the extent that the bland cheese flavor with a slight tendency to bitterness of the control cheese was overcome in the experimental cheese in the presence of the amino-peptidase. The texture of the cheeses was found to be somewhat smoother as well.

Example 8

Novel Specificity of a Protease Encoded by Gene 55

As explained earlier, certain proteins can resist enzymatic hydrolysis as the result of specific amino acid compositions or specific tertiary structures. In such cases the quantity of peptides that can be solubilised from protease resistant proteins can be dramatically improved by using proteases exhibiting novel specificities.

Beta-casein is a protein with very limited tertiary structure but with an extraordinary high level of proline residues. Many proteases have difficulties in cleaving proline containing sequences so that the hydrolysis of beta-casein with commonly available proteases yields a hydrolysate that is relatively rich in large, protease-resistant peptides. The latter resistant peptides can attribute to a number of undesirable properties of the hydrolysate. For example, it is well known that these larger peptides have a relatively strong effect on allergenicity and bitterness. Moreover, these peptides withstand a further degradation into free amino acids so that in certain processes the occurrence of these large, protease resistant peptides are synonymous with yield losses. Therefore, the availability and use of proteases that are capable of cleaving the protease-resistant parts of the proteins, translate into serious technical and economical benefits.

Beta-casein represents one of the major casein fractions of bovine milk. The protein has been well characterised in terms of its amino acid sequence and is commercially available in an almost pure form. As such, beta-casein offers an excellent test substrate for studying the relationship between enzyme cleavage sites and the length of various peptides formed during enzyme hydrolysis.

This Example demonstrates that despite the broad spectrum cleavage character of the endoprotease subtilisin, the addition of a very specific enzyme like a prolyl endopeptidase as encoded by gene 55 (see Table 1) has a major impact on the size of the beta-casein fragments formed.

Beta-casein from bovine milk (lyophilised, essentially salt-free powder) with a minimum 90% beta-casein was obtained from Sigma. Subtilisin from B. licheniformis (Delvolase®, 560 000 DU per gram) was obtained from DSM Food Specialities (Seclin, France). The proline-specific endoprotease as encoded by gene 55 was overexpressed in A. niger and purified using procedures described in Example 4.

Beta-casein powder was dissolved at a concentration of 10% (w/w) together with 0.1% (w/w) Delvolase™ powder in a 0.1 mol/liter phosphate buffer pH7.0. After an incubation of 24 hours at 45° C. in a shaking waterbath, the reaction was stopped by heating the solution for 15 minutes at 90° C. To one half of the solution (1 ml containing 100 milligrams of beta-casein) 100 microliter of the proline-specific protease was added and the reaction was continued for another 24 hours at 45° C. After another heat shock at 90° C., samples of both the Delvolase™ and the Delvolase™+proline-specific endoprotease treated beta-casein material were analysed by LC/MS equipment to study the precise peptide size distributions in the two samples.

LC/MS Analysis

HPLC using an ion trap mass spectrometer (Thermoquest™, Breda, the Netherlands) coupled to a P4000 pump (Thermoquest™, Breda, the Netherlands) was used in characterising the enzymatic protein hydrolysates produced by the inventive enzyme mixture. The peptides formed were separated using a PEPMAP C18 300A (MIC-15-03-C18-PM, LC Packings, Amsterdam, The Netherlands) column in combination with a gradient of 0.1% formic acid in Milli Q water (Millipore, Bedford, Mass., USA; Solution A) and 0.1% formic acid in acetonitrile (Solution B) for elution. The gradient started at 100% of Solution A and increased to 70% of solution B in 45 minutes and was kept at the latter ratio for another 5 minutes. The injection volume used was 50 microliters, the flow rate was 50 microliter per minute and the column temperature was maintained at 30° C. The protein concentration of the injected sample was approx. 50 micrograms/milliliter.

Detailed information on the individual peptides was obtained by using the “scan dependent” MS/MS algorithm which is a characteristic algorithm for an ion trap mass spectrometer. Full scan analysis was followed by zoom scan analysis for the determination of the charge state of the most intense ion in the full scan mass range. Subsequent MS/MS analysis of the latter ion resulted in partial peptide sequence information, which could be used for database searching using the SEQUEST application from Xcalibur Bioworks (Thermoquest™, Breda, The Netherlands). Databanks used were extracted from the OWL.fasta databank, available at the NCBI (National Centre for Biotechnology informatics), containing the proteins of interest for the application used.

By using this technique as a screening method only peptides with a mass ranging from approx. 400 to 2000 Daltons were considered suitable for further analysis by MS sequencing.

Angiotensin (M=1295.6) was used to tune for optimal sensitivity in MS mode and for optimal fragmentation in MS/MS mode, performing constant infusion of 60 μg/ml, resulting in mainly doubly and triply charged species in MS mode, and an optimal collision energy of about 35% in MS/MS mode.

In the sample digested with Delvolase alone, the LC/MS/MS analysis identified 40 peptides covering various parts of the beta-casein molecule. Together these peptides accounted for 79% of the total beta-casein sequence. Different retention times of the peptides on the C18 column could be traced back to peptide lengths ranging from 2 to 23 amino acid residues. Together <15% of the peptides found were smaller than 6 amino acids. The sample digested with Delvolase™ and the proline-specific protease also generated a large number ofl identifiable peptides from beta-casein. Together these peptides covered >50% of the total beta-casein protein sequence. In this sample the peptide size distribution was remarkably homogeneous, as the peptides ranged in length only between 2 and 6 residues. The results show that in the hydrolysate made with the proline-specific protease contain a large fraction of di-, tri-, up to 6 AA peptides, showing the distinct beneficial effect of the co-incubation with an endoprotease featuring an unusual specificity. It is also clear from these experiments that the endoprotease according to gene 55 encodes an endoprotease that cleaves the peptide chain at the carboxy terminus of the proline residue.

Example 9

The Selective Release of Specic Amino Acids to Promote Flavour Formation.

Free amino acids like leucine and phenylalanine have not only been implicated in Maillard reactions but also as precursor for desirable aromas in various food fermentations. To promote the formation of such aromas in food fermentations or during the heating, roasting or baking phase of food, it would be advantageous to incorporate into these products a protein hydrolysate that contains relatively high levels of these specific amino acids in a free form. In this Example we describe the production of yeast extracts selectively enriched t in leucine and phenylalanine. This enrichment is obtained by combining an endoprotease with a cleavage preference for a selected set of amino acid residues with an exoprotease favouring the release of a similar set of amino acid residues. The preference of the endoprotease should match with the preference of the exoprotease used. For example we have established that the aminopeptidases encoded by genes 20 and 54 (see Table 1) feature a definite preference for releasing leucine and phenylalanine residues which matches with the cleavage preferences of thermolysin. The carboxypeptidases encoded by genes 23 and 24 have a preference for releasing arginine and lysine residues which matches the cleavage preferences of trypsin. Carboxypeptidase encoded by gene 5 features a highly unusual preference for releasing glycine which could be combined with certain endoproteases present in papaine. The carboxypeptidase encoded by gene 51 is capable of removing glutamate residues which matches the glutamate specific protease encoded by gene 43.

The endoprotease thermolysin (commercially available as Thermoase)C 180 from Daiwa Kasei KK (Osaka, Japan) is known to cleave peptide bonds at the amino terminal side of bulky, hydrophobic amino acids like Leu and Phe. To liberate the thus exposed amino acids from the newly formed peptides, we used the amino-peptidases encoded by genes nr 20 and 54 (see Table 1). These genes were overexpressed in A. niger according to methods described earlier and purification of these enzymes was carried out according to procedures as described in Example 4.

To release as much leucine and phenylalanine as possible without concomitant release of undesired amino acids with this combination of enzymes, it is evident that the conditions used during enzymatic hydrolysis should be carefully selected. Moreover, the yeasts own endogeneous (and probably a specific) proteases have to be inactivated. After a number of test incubations, a protocol was worked out that leads to a surprisingly selective and effective release of leucine and phenylalanine from the yeast proteins using these two new enzymes.

To inactivate the yeasts endogeneous proteases, the yeast suspension was kept for 5 minutes at 95 degrees C. Then the suspension was quickly cooled down to the required temperature and the pH was adjusted to 7.0 using 4N NaOH. The yeast, the thermolysin and one of the aminopeptidases were all incubated simultaneously under the following conditions. After the heat shock, the pH of the 2000 milliliters yeast suspension was adjusted to 7.0 after which 680 milligrams of Thermoase were added and, after stirring, the purified aminopeptidase. The mixture was incubated with stirring at 50 degrees C. for 3 hours and centrifuged. To stop all enzymatic activities the pH of the supernatant was adjusted to 4 and subjected to another heat treatment of 45 minutes at 95 degrees C. After another centrifugation a sample for amino acid analysis was obtained from the supernatant. Precipitated or non-dissolved matter was removed by centrifugation for 15 minutes at 3500 rpm in an Hereaus Megafuge 2.0 R centrifuge. Supernatant was removed and kept frozen at −20° C.

Samples of the supernatant, were analysed for amino acid content according to the Picotag method of Waters (Milford Mass., USA) immediately after thawing.

In the amino acid analysis Trp and Cys values were omitted And Asp and Asn values were summed as one value. According to the data obtained, in the resulting hydrolysate the ratio between alanine and leucine (21.3:11.7) was 1:0.5 Commercially available yeast hydrolysates typically exhibit alanine versus leucine ratio's of 1:0.3.

In a second experiment a yeast extract was prepared that was enriched in free glutamate. To achieve this, use was made of an endoprotease exhibiting a preference for cleaving at the C-terminal end of glutamate residues (encoded by gene nr 43 in Table 1) and a carboxypeptidase (encoded by gene nr 51 in Table 1) capable of removing these glutamate residues thus exposed. The endoprotease encoded by gene nr 43 and the carboxypeptidase encoded by gene 51 (see Table 1) were overexpressed in A. niger according to methods described earlier. Purification of these enzymes was carried out according to procedures as described in Example 4.

The essential role of free glutamate in a number of aroma forming processes is well documented and MSG, the sodium salt of glutamic acid, is recognized as the single most important taste enhancing component.

In this Example the pH of the 200 ml heat shocked yeast suspension is adjusted to 8.0, then the purified enzyme product encoded by gene 43 is added and the mixture was incubated for 4 hours at 50 degrees C. Then the pH was lowered to 5.0 and the suspension was centrifuged. To 100 milliliters of supernatant the purified gene product of gene 51 is added. Incubation with this carboxypeptidase took place for 30 minutes at 50 degrees C. with continuous pH adjustments. After stopping the enzyme incubation by a heat treatment of 5 minutes by 95 degrees C., the material was again centrifuged (see above) and a sample was obtained for amino acid analysis.

According to the amino acid data obtained (see above), in the resulting hydrolysate the ratio between alanine and glutamate (30.0:48.7) was 1:1.6. Commercially available yeast hydrolysates typically exhibit alanine versus glutamate ratio's of 1:1.

Example 10

Flavour Evaluation of Yeast Hydrolysates Enriched in Specific Amino Acids.

To prove that a protein hydrolysate enriched in specific amino acids according to the invention can generate specific aroma's, a number of experiments were carried out with the yeast hydrolysates described in an earlier Example. To that end larger portions of these hydrolysates were prepared and lyophilised. The performance of the resulting powders were compared with the performance of a commercially availble yeast extract (Gistex LS, obtainable from DSM Food Specialties, Delft, The Netherlands) in a standardised mixture under several reaction conditions. The standardised mixture consisted of one of the hydrolysates, base mixture and water.

The base mixture contained 22 grams of Maxarome Plus Powder (a specialised yeast extract with a high content of natural nucleotides, also obtainable from DSM Food Specialties), 29.2 grams of glucose, 9 grams of REFEL-F fat (hydrogenated soy oil, obtainable from Barentz, Hoofddorp, The Netherlands) and 0.2 grams of calcium stearoyl lactylate (emulsifyer, obtainable from Abitec, Northampton, UK) thoroughly mixed in a mortar.

All standardised mixtures contained 5 grams of yeast hydrolysate powder (i.e. either the leucine or the glumate enriched material or the commercial yeast extract), 3 grams of the base mixture and 3 grams of water. After thorough mixing, these three slurries were subjected to different heating regimes i.e. either 65 minutes at 90-95 degrees C. in a reaction vial (liquid reaction) or dried at 20 millibar at 120 degrees C. in a vacuum oven (vacuum roast reaction) or heated in an open reaction vial at 120 degrees C. for 10 minutes after the dissipation of all water (roast reaction).

After the heat treatment all three products Shad assumed colours ranging from dark brown to almost black. In case of the vacuum roast reaction only the light coloured top layers were used. Taste evaluation of the heated products was carried out by grinding the blackened cakes into fine powders and dissolving these powders to a concentration of 2% (w/w) in water containing 0.6% (w/w) NaCl. The observations of the taste panel are specified in Table 3.

TABLE 3 Reference Leucine Glutamate Liquid Bouillon, slightly Cold tea, slightly More bouillon, meaty, roast flowery, yeasty yeasty Vacuum Burnt, fried Astringent, beans, Burnt, bouillon, yeasty roast potatoes yeasty Roast Dark roast, Less roast, flowery, Roast, more bouillon, bouillon, umami umami more umami

Example 11

Non-Allergenic Whey Protein Hydrolysates Formed with Tripeptidylpeptidases.

The dipeptidylpeptidases encoded by the genes 19 and 55 as well as the tripeptidylpeptidases encoded by the genes 4, 9, 10, 12, 26, 35, 46, and 50 (see Table 1) may be overproduced as described and may be purified according to the methods provided in Example 4. After purification the pH optimum and the temperature stability of each individual enzyme may be established by any of the methods available and known by the skilled person. Furthermore, the specificity of each individual enzyme may be determined using the methods outlined in Example 1. The selectivity exhibited by tripeptidylpeptidases is illustrated in the following experiment.

The enzyme encoded by gene 12 was overproduced in an Aspergillus niger host cell and purified by procedures described in Example 4. The enzyme thus obtained was incubated at pH 5 and 50 degrees C. with different synthetic chromogenic substrates i.e. Ala-Ala-Phe-pNA and Ala-Phe-pNA (both from Bachem, Switserland). The incubation with the Ala-Ala-Phe-pNA substrate led to a significant increase of the absorbance at 410 nm whereas the incubation with Ala-Phe-pNA did not. This observation clearly demonstrates that tripeptidylpeptidases cleave off tripeptides and do not exhibit aminopeptidase activity that can lead to an undesirable increase of free amino acids.

Moreover, the enzyme encoded by gene 12 shows favourable enzyme stability characteristics as shown in the following experiment. Four samples of the enzyme were incubated at pH 5 for one hour at 0, 40, 50 and 60 degrees C. respectively. Then each enzyme sample was incubated with the above mentioned Ala-Ala-Phe-pNA substrate in a citrate buffer at pH5 and the residual activity in each individual sample was determined by measuring the increase in absorbance at 410 nm. With the 0 degrees C. sample showing 100% activity, the 40 degrees sample showed 96% residual activity, the 50 degrees sample 92% residual activity and the 60 degrees sample 88% residual activity.

In a typical process aimed at producing a hydrolysate with a high proportion of tripeptides, whey protein (WPC 75) may be dissolved/suspended in a concentration of 100 grams of protein/liter, in an aqueous medium having a pH of 8.5. The first enzyme incubation is with the broad spectrum endoprotease subtilisin (Delvolase®, 560 000 DU per gram from DSM). After a predigestion of the whey with this enzyme in a concentration of 0.5% enzyme concentrate per gram of protein for 2 hours at 60 degrees C., the mixture is heat-treated to inactivate the endoprotease used. Then the temperature is adjusted to 50 degrees C. and the tripeptidylpeptidase is added and the whole mixture is incubated until the desired level of tripeptides is reached. Further processing steps of the hydrolysate thus obtained depend on the specific application but may incorporate microfiltration or centrfugation followed by evaporation and spray drying. 

1. A isolated polynucleotide that encodes a tripeptidyl peptidase wherein said polynucleotide hybridizes to the full length complement of SEQ ID NO: 9; SEQ ID NO: 10; SEQ ID NO: 12; SEQ ID NO: 35; SEQ ID NO: 50; SEQ ID NO: 66; SEQ ID NO: 67; SEQ ID NO: 69; SEQ ID NO: 92; or SEQ ID NO: 107 under hybridization conditions of 5×SSC, 5× Denhardt's solution, and 1.0% SDS at 68° C. and wash conditions of 0.2×SSC and 0.1% SDS at room temperature.
 2. An isolated polynucleotide encoding a tripeptidyl peptidase, which comprises a nucleotide sequence selected from the group consisting of SEQ ID NO: 9; SEQ ID NO: 10; SEQ ID NO: 12; SEQ ID NO: 35; SEQ ID NO: 50; SEQ ID NO: 66; SEQ ID NO: 67; SEQ ID NO: 69, SEQ ID NO: 92; and SEQ ID NO:
 107. 3. The isolated polynucleotide of claim 1 obtainable from a filamentous fungus.
 4. The isolated polynucleotide of claim 3 obtainable from Aspergillus niger.
 5. An isolated polynucleotide encoding a polypeptide having tripeptidyl peptidase activity, said polypeptide comprising an amino acid sequence of SEQ ID NO:123; SEQ ID NO:124; SEQ ID NO:126; SEQ ID NO:149; or SEQ ID NO:164.
 6. An isolated polynucleotide encoding a polypeptide having tripeptidyl peptidase activity, which is at least 95% identical to SEQ ID NO: 123; SEQ ID NO: 124; SEQ ID NO: 126; SEQ ID NO: 149; or SEQ ID NO:
 164. 7. The isolated polynucleotide of claim 1, which hybridizes to the full-length complement of SEQ ID NO: 66; SEQ ID NO: 67; SEQ ID NO: 69; SEQ ID NO: 92; or SEQ ID NO:
 107. 8. A vector comprising the polynucleotide sequence of claim
 1. 9. The vector of claim 8 wherein said polynucleotide sequence is operatively linked with regulatory sequences suitable for expression of said polynucleotide sequence in a suitable host cell.
 10. The vector of claim 9 wherein said suitable host cell is a filamentous fungus.
 11. A method to prepare a tripeptidyl peptidase comprising the steps of culturing a host cell comprising the vector of claim 9 and isolating said tripeptidyl peptidase from said host cell.
 12. An isolated polypeptide having tripeptidyl peptidase activity, which comprises an amino acid sequence selected from the group consisting of SEQ ID NO:123; SEQ ID NO:124; SEQ ID NO:126; SEQ ID NO:149; and SEQ ID NO:164.
 13. An isolated polypeptide obtainable by expressing the polynucleotide of claim 1 in a host cell.
 14. An isolated polypeptide having tripeptidyl peptidase activity, which comprises an amino acid sequence at least 95% identical to SEQ ID NO: 123; SEQ ID NO: 124; SEQ ID NO: 126; SEQ ID NO: 149; or SEQ ID NO:
 164. 15. An isolated recombinant host cell comprising the vector of claim
 9. 16. An isolated recombinant host cell expressing the polypeptide of claim
 14. 17. The isolated recombinant host cell of claim 15 wherein said host cell is from an Aspergillus species.
 18. A fusion protein comprising the amino acid sequence according to claim
 14. 19. A fusion protein comprising the amino acid sequence according to claim
 12. 20. A vector comprising the polynucleotide sequence of claim
 6. 21. The vector of claim 20 wherein said polynucleotide sequence is operatively linked with regulatory sequences suitable for expression of said polynucleotide sequence in a suitable host cell.
 22. A method to prepare a tripeptidyl peptidase comprising the steps of culturing a host cell comprising the vector of claim 21 and isolating said tripeptidyl peptidase from said host cell.
 23. An isolated recombinant host cell comprising the vector of claim
 21. 24. The isolated polynucleotide of claim 6 wherein said polypeptide is at least 98% identical to SEQ ID NO: 123; SEQ ID NO: 124; SEQ ID NO: 126; SEQ ID NO: 149; or SEQ ID NO:
 164. 25. The isolated polypeptide of claim 14 wherein said polypeptide comprises an amino acid sequence at least 98% identical to SEQ ID NO: 123; SEQ ID NO: 124; SEQ ID NO: 126; SEQ ID NO: 149; or SEQ ID NO:
 164. 